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1. Introduction and results 



In numerous applications one wants to compute the expectation of a function f-.D^R 
with respect to a probability measure n defined on a measurable space {D,D). The goal 
is to approximate 

S{f)= [ f{x)n{dx), (1.1) 



JD 

where we assume that it is not possible to sample n directly with reasonable cost. In other 
words, we assume that there is no random number generator which generates a sample 
with respect to tt reasonably fast. This might happen if the available information on n 
is incomplete or one has a complicated measurable space. However, many applications 
have in common that one knows enough about tt to design a Markov chain which approx- 
imates the desired distribution. Hence we assume that we cannot sample n directly, but 
we can run a Markov chain to get close to n. 
Let us briefly illustrate such problems: 

Let A c M'' be an arbitrary convex bod}^ Suppose that we can sample the uniform 
distribution on 

An I for an arbitrary line i. 

The goal is to simulate the uniform distribution on A, say /m- For a complicated A 
it might be impossible to generate a uniformly distributed sample with reasonable 
cost. But the hit-and-run algorithm (see Section|4]2) provides a Markov chain which 
has the limit distribution fiA- 

Let £) c R"* be a convex body. Suppose that /: D ^ R is an integrable function 
with respect to tt^, where g is an unnormalized positive density and 



f . p(x) dx 



The goal is to approximate 



J^f{x)g{x) dx 



S{f,g) = / /(x)7re(dx) - r t \ A 

By the Metropolis algorithm based on the ball walk (see Section |4J} one can con- 
struct a Markov chain which has the limit distribution -Kg. It might be impossible to 
sample tt^ directly, in particular if is a complicated density function. 

One can ask the following questions. How does the error of numerical integration based 
on Markov chains behave? And, how long does the Markov chain need to get close to 
the limit distribution? 



The thesis deals with the first question and, because of the close relation, touches briefly 
the second one. The Markov chain Monte Carlo method for approximating the expec- 
tation plays a crucial role in computer science, in statistical physics, in statistics, and in 

^ A convex body is a bounded convex set with non-empty interior. 
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financial mathematics, see e.g. I GRS96I |Mar991 ILiuOSI |Dia091 IBGJM1 1 1 . Suppose that 
the function /: D -j> M is given by an oracle which provides function values of /. The 
goal is to approximate S{f). The integral simplifies to a sum if the state space D is finite, 
such that 

Sif)= J2 fix) 7:ix). (1.2) 

xeD 

We assume that the distribution n can be simulated by a Markov chain {Xn)neN with 
transition kernel K and initial distribution i'. The distribution n is the limit distribution, in 
particular it is stationary, i.e. 



Tr{A)= / K{x,A)Tr{dx), AeD. 

J D 



Under weak assumptions on the Markov chain we obtain that after sufficiently many steps 
m > no, the distribution of X,„ is close to tt. The number no determines the number of 
steps to get close to tt, it is called the burn-in or the warm up period. Afterwards, we 
approximate S{f) by 



1 

It is well known that an ergodic theorerrH holds which says that 

1 " f 
lim Sn,uoU)= lim - V /(Xj+„J = / /(a;) 7r(da;) = almost surely. 

n— ^oo n— >oo Jl ^— ' J 



This means that the algorithm is well defined but does not imply an error bound. It is a 
qualitative rather than a quantitative result. We study the mean square error of S'„,„(,. For 
a function /, integrable with respect to tt, it is given by 



'{Sn,nQ,f) — {^y,K \Sn,na{f) ~ S{f)\' 



1/2 



where E^ ^^ denotes the expectation of a Markov chain with transition kernel K and initial 
distribution v. 

The main topic of the thesis is the presentation of old and new explicit error bounds for 
computing the expectation by Markov chain Monte Carlo. These bounds are in terms of 
the ||-||p-norm of the integrand /, 

IIJII ^{Un\f{xr ^{Ax)f'\ pG[2,oo), 
^ |7r-esssup^g£, |/(a;)| , p = oo. 

The kernel K of the Markov chain determines the Markov operator 

Pf{x) = f f{y)K(xAv), xeD, 

JD 



^Suppose that {D,Ti) is countably generated. Let the Markov chain (X„)„gN be i^-irreducible (ip is a non- 
trivial CT-finite measure, for all A e 2) and for aW x e D there exists an n g N such that </3(A) > implies 
K"(x, A) > 0). We assume that vr is a stationary distribution. Furthermore for all A g D and for all x g D 
we have Pr(X„ e A infinitely often | Xi = x) = 1. Then lim„^oo Sn,na(f) = S{f) almost surely For a 
proof of the fact s ee [MT09 Theorem 17.1.7, p. 427]. For a simple approach of a similar ergodic theorem 
we refe r to [AGIOl. For a central limit theorem and fixed-width asymptotics of Markov chain Monte Carlo see 
[Gey92, JHCN06 . 
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and S{f) = Jjj f{x) tt{c\x) can be considered as operator mapping into tine constant 
functions. If P is self-adjoint acting on L2 then the Markov chain is called reversible. The 
asymptotic error is completely known if the underlying Markov chain is reversible, the 
initial distribution has a bounded density with respect to tt and one has - < 
Ma^ for an a e [0, 1) and M < 00, see Corollary [3.371 One obtains 



lim 



sup Eu(,Sn,no 

l/IU<i 



1 + A 



< 



1 - A - 1 - A' 



and 



lim 

no— >oo 



sup Bi/{Sn,no 

I/Il2<l 



1 + A 



2A(1- A") 



< 



(1.3) 



(1.4) 



A) n2(l- A)2 - A)' 

where A = sup {a \ a e spcc(P - S)}. Similar asymptotic estimates are shown in [Sok97l 
IMat99l [Bre99 Mat04, RR08]. However, we want to have explicit error bounds. The 
desired error estimate should behave asymptotically as described in i ll .3t and l|1 .4} . For 
A close to 1 the right hand sides of the equalities of the asymptotic error can be very well 
estimated by and The main goal is to prove non-asymptotic, explicit error 

bounds with respect to ||/| 



-A)- 

„ of the form 



sup ei/{Sn,noi 

WfW <i 

I J lip — 



2 a,p 7"° 

n(l-A) n2(l-7)2 



where C^^p and 7 < 1 should be known. If the initial distribution of the Markov chain 
is the stationary one, say tt, then the influence of the initial part resulting from ly should 
vanish, i.e. Ci,,p = 0. We give more details in the following. 

First we consider the special case where the state space is finite. Let the cardinality of D 
be astronomically large, say for example \D\ = 10^", such that an exact computation of 
the sum i|1 .2> might be practically impossible. Suppose that we have a Markov chain with 
transition matrix P and initial distribution i^. All definitions, such as stationarity, irreducibil- 
ity, aperiodicity and all relevant facts of Markov chains on finite state spaces are provided 
in Section IZTl The Markov chain is reversible if the transition matrix P = {p{x,y))x,yeD 
fulfills for a probability measure tt that 

TT{x)p{x,y) ^TT{y)p{y,x), x,y e D. 

If the Markov chain is reversible, then let us define 



P^\\P~S\l 



■-{Pl,\P\D\-l\}, 



where /3i is the second largest and P\d\-i the smallest eigenvalue of P. We consider 
reversible and ergodic Markov chains, i.e. (3, the second largest absolute eigenvalue of 
P, is less than 1. Hence also Pi, the second largest eigenvalue of P, is less than 1. 
Section |Z2] contains the first error estimate. The explicit error bound is developed with 
respect to the ^-norm of the integrand / e K^. For 



C = 



we obtain in Theorem |2.20| that 



1 




^-1 




TT 


C30 


TT 


1 


< 




2 


2C/3"« 


n(l 


-Pi) 


^ ^2(1-/3)2 



(1.5) 
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Obviously C is if is TT and tine asymptotic estimates of dl .3l i and l|1 .4} Inold true. How- 
ever, tine factor ||^||^ is unsatisfactory for an extension to general state spaces. Further- 
more we also provide a lower bound of the error, see Remark 12.241 In Section IZ3l we 
suggest a choice of the burn-in. The main result is as follows. 

Theorem [2.251 Suppose that 

log(/?-i) ' (■ 

Then 

n(l-/3i) " n2(l-/3)2 ^ y^^^,''-^^-^-'^' ^> - ^Ul^) ^ {1 - /3)^ ' 

The suggestion of the burn-in is optimal in the following sense. For 77 > let the number 
of steps N ^ n + no 0^ the Markov chain be large enough, let /3 = /3i and assume that C 
and (3 obey an additional less restrictive condition. Then the burn-in nopt, which minimizes 
the upper error bound of 1 II .5t , satisfies nopt e [no, (1 + 77)710]. 

In many examples an estimate for 13 is available. In Section IZ4l we consider some illus- 
trating examples where all eigenvalues and eigenvectors are known, so that the exact 
error is computable. Then we compare the lower and upper estimates with the exact 
error. It turns out that the estimates are sharp depending on the available information 
of the eigenvalues. Similar estimates can be found in [Ald87] and [NP09]. However, the 
suggestion of the burn-in and the lower bound seem to be new. 

After the study of Markov chains on finite state spaces let us introduce the general state 
space setting. Assume that the measurable space is given. Then the desired 

expectation becomes an integral, see {Tj}. Suppose we have a Markov chain with tran- 
sition kernel K and initial distribution 1^. Let us recall that the transition kernel K defines 
the Markov operator 

Pf{^) = / f{y)K{x,dy), X&D, 

J D 

and S{f) = f{x) Ti{Ax) can be considered as operator mapping into the constant 
functions. It is well known that reversibility of K is equivalent to self-adjointness of P 
acting on L2. In Section [Ol we provide all definitions such as stationarity and reversibility 
in detail. Furthermore it contains all relevant convergence properties of Markov chains. 
Mainly the two convergence properties of Definition |3.14| and Definition |3.10l are essential: 

• Let a G [0, 1) and M < 00. The Markov chain is called Li-exponentially convergent 
with {a, M) if 

||PJ-5|L , <Ma\ jeN. 

II IILi-s-Li — ' 

For reversible Markov chains Li-exponential convergence with {a, 2M) is equiva- 
lent to TT-a.e. uniform ergodicity with (a, M), see Proposition 13.241 

• The Markov operator has an i2-spectral gap if 

where the gap is given by 1 - /3. The existence of an L2-spectral gap implies an 
exponential convergence of to S with respect to the L2-operator norm for j -> 00. 
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Section contains tine error estimates for Sn,no- We explain tine main results in the 
following. Let A be the largest element of the spectrum of P - 5* acting on L2, i.e. 

A = sup {a I a G spec(P — S)} . 

Suppose that the Markov chain is reversible and Li-exponentially convergent with {a, M). 
Furthermore assume that there exists a bounded density ^ of the initial distribution v with 
respect to vr. For 



C = M 

we show in Theorem |3.34| that the error obeys 

sup e^{Sn,no,ff < 



dv ^ 
dn 



Il/ll2<l 



n(l-A) n2(l-a)2 



Note that the error bound is of the same form as for finite state spaces except for the fact 
that a of the Li-exponential convergence appears. If the transition kernel is reversible one 
has A < ^ and in Proposition [3]24] it is shown that 13 < a. Hence one can further estimate 
the leading term of the error bound by using (1 - A)"^ < (1 - a)"^ Then a reasonable 
choice of the burn-in can be obtained by the same arguments as for finite state spaces. 
In Section [331 we also justify the choice of the burn-in by numerical experiments, which 
confirm the theoretical result. 



Theorem I3.45l lin. Suppose that we have a Markov chain which is reversible with respect 
to TT and Li-exponentially convergent with (a, M). Let 



no 



max • 



log(M 



1|| ) 



log(a-i) 



,0 



Then 



The condition that the Markov chain is Li-exponentially convergent with {a, M) is rather 
restrictive. This motivates the study of Markov chains which fulfill a weaker convergence 
property, namely we assume that there is an i2-spectral gap. Let us provide the main 
result. 



Theorem I3.45I i(m). Suppose that we have a Mariiov chain with Markov operator P which 
has an L2-spectral gap, I- (3 > 0. Forp e (2, 00] let no{p) be the smallest natural number 
(including zero) which is greater than or equal to 



1 



log(/3-i) 



P-^log f 32p II di/ 
)-2) \ p-2 II diT 



2(p 

.log (64 11^ 



- 1 



(2,4), 

pe [4,00]. 



Then 



sup e^{Sn,noip)J)'^ - 



il/ll <1 
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For further details let us refer to Section 13.31 There we justify the choice for the burn- 
in no{p) by numerical experiments. Briefly summarized, by weakening the convergence 
property we get an estimate of the error for a smaller class of functions. Namely, we 
prove an error bound for integrands / which satisfy < oo where p > 2. 
The last chapter deals with applications. The problem of integration with respect to log- 
concave densities is the following. For a function f-.D^R and a convex body D cR'^ 
the goal is to approximate 



S{f, g) 



J^f{x)g{x)dx 
Id ^i^) 



where g is an unnormalized density. The problem is linear in / but not in g. Suppose 
that the domain D is the d-dimensional Euclidean unit ball B"^. Furthermore assume that 
g is log-concave and \ogg is Lipschitz continuous with Lipschitz constant L. Hence we 
consider the class of densities 

n^{B'^) ^{g>0\g\s log-concave, |log(?(a;) - log (?(?;) | <L\\x^y\\^}, 

where H H^ denotes the Euclidean norm. We analyze the Metropolis algorithm based on 
a 5 ball walk, see Algorithm [Hon page[8T|arid for the Procedure IBall-Walkl see page[ 



The algorithm generates the desired sample. The sample, say {X^^_^_^, . . . ,X^^^„), is 
used to compute 

For an adapted 5 ~ min + 1)^\L"^} Mathe and Novak showed in [MN07] that the 
Markov chain which is defined by the Metropolis algorithm based on a 5 ball walk has an 
L2-spectral gap. This result is used to get an explicit error bound. We state the result for 
the unit ball and for simplicity we consider integrands / with \\f\\^ < 1. For 



no X dL max {d, L^} 



the error obeys 



sup e(S'f (/,£())-< W-max|\/d,L| + -max{d,L^} , 

where d eN and L > 00 The geometry of the unit ball is essential for the estimate of the 
L2-spectral gap of [MN07], since the ball walk might get stuck on domains which have 
corners. However, the results of Section |4T| are slightly more general. There we treat 
balls with arbitrary radius r > and the result is with respect to ||/||p forp > 2. We refer to 
Theorem ITHl for the details. The number of function evaluations to obtain an error smaller 
than e is polynomially bounded in the dimension d and the Lipschitz constant L. Hence 
the problem of integration with respect to a log-concave density is tractable, see Novak 
and Wozniakowski [NVVQl, NW10 I. 

The problem of integration on a convex body is as follows. Let A c K'^ be a convex body. 
The goal is to compute 

S{f,A) = —^ f f{x)dx, 



VOlrf(A) J A 



^We use the notation -< and x as follows. Let (a„)„gn and (fen)ngN be positive sequences. We write a„ -< b„ 
if there exists an absolute constant c such that a„ < cb„ for all n e N. We write a„ x b„ if a„ -< b„ and 

bn -< a„. 
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where -voUiA) denotes the d-dimensional volume of A. In other words the goal is to 
approximate the expectation of / with respect to the uniform distribution, say /m, on A. 
The problem is linear in / but not in A. Let B'^ c A c rB'^ where rB"^ is the Euclidean 
ball with radius r around 0. We assume that there is an oracle OrA(£) which returns a 
uniform distributed state on Ac^l for an arbitrary line Hence we consider state spaces 
from the class 

Sd{r) = {A c M'' convex | B'^ C ^ C rB'^] 

and we assume that OxAii) is available for any A e Sd{r). We analyze the hit-and-run 
algorithm, see Algorithm |2] on page[87]and for the Procedure IHit-and-Run] see page[ 
It generates the desired sample, say {x'^^^^-^^, . . . , J. Afterwards we compute 



i=i 

The Markov chain generate d by th e hit-and-run algorithm has the right stationary distri- 
bution, see Lemma |4~T0] or iSmi841 . A result of Lovasz and Vempala presented in ILV061 
provides an estimate of 1 - /3. Hence there exists an L2-spectral gap and we can apply 
the error bound of Theorem [3. 45 I iIm). For simplicity suppose that \\f\\^ < 1. For 

no X (fr"^ log(r) 

the error obeys 



sup e{SZ^^,if,A))^ 

,<l,AeSi(r) 



dr d r 



'2J2 



For the general result with respect to ||/||p with p > 2 we refer to Theorem [4.121 The 
number of function evaluations to obtain an error smaller than e is polynomially bounded 
in the dimension d and the radius r. Hence the problem of integration on a convex body 
is tractable, see INW08..NW10J . 
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2. Finite state spaces 



In the following we study the mean square error of Markov chain Monte Carlo methods on 
finite state spaces. In Section IZTI the basic definitions and properties of Markov chains 
on finite state spaces are stated. The estimate of the mean square error is shown in 
Section 1221 We suggest and justify a recipe how to choose the burn-in. Afterwards the 
error bound is applied to illustrating examples and finally we discuss how the results fit 
into the published literature. 

2.1. Markov chains 

in this section the basics of Markov chains on finite state spaces are provided. Let D be 
a finite set and V{D) the power set of D such that the measurable state space {D,V{D)) 
is given. 

Definition 2.1 (Markov chain). A sequence of random variables {Xn)neN on a probability 
space (fi, J", Pr) mapping into {D,V{D)) is called a Markov chain with transition matrix 

P = {p{x,y))x^yeD if for all n eN, a\\ x,y e D and for all xi,. . . ,x„_i with 

Pr{Xi = Xi,.. .,Xn-l = Xn-l,Xn ^ x) > 

one has 

Pi{Xn+i = y \ Xi = xi,. . . ,X„_i = Xn-i,Xn = x) = Pr(X„+i ^ y \ Xn ^ x) ^ p{x,y). 

All entries of the transition matrix P are non-negative and the rows sum up to 1. For 
x,y e D the value p{x, y) is the probability of jumping from state x to state y in a single 
step of the chain. The distribution 

v{x) = Pr(Xi = x), X e D, 

is called the initial distribution. 

Suppose that we have a transition matrix P and a probability measure v. Any transition 
matrix P has a random mapping representation, see ILPW09I Proposition 1 .5, p. 7]. A 
random mapping representation of P on state space £> is a function $: x [0, 1] ^ £>, 
which satisfies 

Pii^x, Z)^y)^p{x,y), x,yeD, 

where Z -. (17, J", Pr) ([0, 1],S([0, 1])) is a uniformly distributed random variable, where 
B{[0, 1]) denotes the Borel cr-algebra. Then a Markov chain can be constructed as follows. 
If {Zn)neN is a sequence of i.i.d. random variables with uniform distribution, and Xi has 
distribution i^, then it is easy to see that (X„)„gN defined by 

X„ = $(X„_i,Z„), n>2, 
is a Markov chain with transition matrix P and initial distribution i^. 
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In the following assume that we have a Markov chain {Xn)nen with transition matrix P and 
initial distribution u. The expectation E^,p is taken with respect to the joint distribution of 
(X„)„eN. say W^,p, which is defined on {D^,a{A)) where 

= {uj = {xi,x2, . . . ) I a;, e r> for all i > 1} and 

A= [J {Aix A2X ■■■xAkX Dx---\Ai€ V{D), i = l,...,k}. 

ken 

For A; e N one has with A^x ■ ■ ■ x Ak <z D'^ that 

W^,p(Ai X ••• X Afe X D X •••) = X] S 'Pv{Xx=xi,...,Xk=Xk). 

If the initial distribution v is S^, the point mass at a; e .D, we say that the Markov chain 
starts at x. By 

Pf{x) = ^ f{y)p{x,y) = J2 /(y)Pr(^2 = y\X,=x) = Es^,p[f{X2)] 

yeD yeD 

one has the expectation of / e after a single step of the chain which starts atx e D. 
The probability to get from a; to y in > 2 steps is 

Pr{Xk+i = y \ Xi = x) = ^ ••• ^ P{x,X2)p{x2,X3)...p{xk,y)=p''{x,y), 

x2eDx3eD xkeD 

where P'' = (p'^ix, y))x,yeD denotes the fcth power of P. Then 

p'^fix) = fiy)p\^^y) = E /(2/)Pr(Xfe+i = y\x, = x) = E,^,p[/(Xfe+i)] 

yeD yeD 

is the expectation after k steps of the Markov chain which starts at x. Similarly we con- 
sider the application of P to a probability measure v, i.e. 

vP{x) = J2 Piy^x) ^(y) = J2 P^(^2 =x\X^=y) v{y) = Pr(X2 = x). 

yeD yeD 

This is the distribution which arises after a single transition where the initial state is chosen 
by V. The distribution which arises after A; > 1 steps is given by 

vP^{x) = pHv, x) iy{y) = E Pr(Xfc+i =x\X,=y) u{y) = Pr(Xfc+i = x). 

yeD yeD 

In the following we present properties of transition matrices. 

Definition 2.2 (irreducibility, aperiodicity, periodicity). A transition matrix P is called irre- 
ducible if for all x,y e D there exists ak eN such that 

/(x,j/)>0, where P'' = 

A transition matrix P is called aperiodic if we have for all x e £> that 

gcd({fceN|/(a;,a;) > O}) = 1, 

where gcd denotes the greatest common divisor. If P is not aperiodic we call it periodic. 
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If a transition matrix is irreducible, then one can reach every state y from every state x in 
finitely many steps. Aperiodicity ensures that the number of steps to return to an arbitrary 
state is not in {m, 2m, 3m, . . . } for m > 1. 

Definition 2.3 (stationarity). Let tt be a probability measure on D. Then tt is called a 
stationary distribution of a transition matrix P if 

nP(x) = n{x), X € D. 

If the initial distribution of a Markov chain with transition matrix P is a stationary one, say 
TT, then after a single transition the same distribution as the initial one appears, i.e. 

Pt{Xi =x)= Tr{x) = ttP{x) = Pr(X2 = x), x e D. 

Definition 2.4 (reversibility). Let tt be a probability measure on D. A transition matrix P 
is called reversible witli respect to n if 

Tr{x)p{x,y) ^TT{y)p{y,x), x,y £ D. 

If a transition matrix P is reversible with respect to a probability measure tt, then tt is a 
stationary distribution (see [LP W09. Proposition 1.19, p. 14]). If the initial distribution of a 
Markov chain with transition matrix P is tt, then reversibility with respect to n is equivalent 
to 

Pr(Xi =x,X2 = y)= Pt{Xi =y,X2 = x), x,y e D. 

Definition 2.5 (lazy version). Let P be a transition matrix and let / be the identity matrix. 
Then we call 

the lazy version of P. 

Let TT be a stationary distribution of a transition matrix P, then tt is also stationary with 
respect to P. If P is irreducible, reversible with respect to tt and periodic, then one can 
pass over to the lazy version P and obtains that P is irreducible, reversible with respect 
to TT and aperiodic. 

A Markov chain is called irreducible, periodic, aperiodic and reversible with respect to n if 
the corresponding transition matrix is irreducible, periodic, aperiodic and reversible with 
respect to tt, respectively. 

Let us state some well known implications of the different properties. For proofs or 
more details see [Bre99' Str05', LPWOQ]. For every transition matrix there exists a sta- 
tionary distribution and if the matrix is irreducible then there exists a unique station- 
ary distribution, which is positive ([LPW09, Proposition 1.14, p. 12 and Corollary 1.17, 
p. 14]). Note that if ^ is an eigenvalue of a transition matrix P, then |^| < 1 ([LPW09, 
Lemma 12.1(1), p. 153]). Furthermore, for irreducible transition matrices 1 is a simple 
eigenvalue ( [LPW09I Lemma 12.1(11), p. 153]). If the Markov chain is aperiodic and irre- 
ducible, then -1 is not an eigenvalue of P ([LPW09, Lemma 12.1 (ill), p. 153] or fStr05[ 
Theorem 5.1.14, p. 113]). These eigenvalue results are also known as results of the 
Perron-Frobenius Theorem, see I SenOSI . 

In the following we always assume that the Markov chains are irreducible, aperiodic and 
reversible with respect to a probability measure tt. Hence tt is the stationary distribution. 
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Aperiodicity is not essential. For a Markov cinain witin periodic transition matrix P and 
initial distribution v we may consider a lazy Markov chain, i.e. a chain with aperiodic tran- 
sition matrix P, the lazy version of P, and initial distribution v. 

Let us define a weighted scalar product for /, g e MP by 

{f.9) = Y.f{x)g{x)i,{x) 

xeD 

and let II/II2 = (/, /)^^^. By considering the scalar product it is easy to see that reversibil- 
ity is equivalent to P being self-adjoint. Applying the spectral theorem for self-adjoint 
transition matrices and the fact that the Markov chain is irreducible one obtains that P 
has real eigenvalues 

1 = /3o > /3i > /32 > • • • > > -1. 

If the transition matrix is aperiodic, then fi\D\-i > -1. There exists a basis of orthonormal 
eigenfunctions (vectors) {wo, wi, ■ . ■ , "|_d|-i}, i-e. for i, j e {0, . . . , \D\ - 1} one has 

Clearly, un{x) = 1 for aW x & D and S{ui) = {ui,uo) = for i e {1, . . . , - 1}. By the 
spectral structure of the transition matrix one has 

\D\-1 

= (/(x, y)),^yeD - E ( ^^i^h^iy) <V) ).,yeD > (2-1 ) 

i=0 

see |Bre99l p. 203] for details. 

Forp e [1,00] let 

1,^1, ^|(E.ei.l/(^)r'r(a:))VP, pG[l,oo), 
P [sup^g£, |/(a;)| , p^oo. 

The weighted vector space ip = ip{D,TT) is defined by the normed space (M^,||-||p). 
Furthermore let 

el^£l{D,n) = {feep\S{f)=0}. 

Then 

e2^el®{i°2)^, with (^0)^ = {/eM^|/ = c, ceR}-Eig(F,l), 

where Eig(P, 1) is the eigenspace of P with respect to the eigenvalue 1. Note that for 
the next well known result it is not assumed that the transition matrix is reversible with 
respect to n. 

Lemma 2.6. Let p e [l,oo] and / e R^. For any transition matrix P witli stationary 
distribution tt one obtains 

||P/||,< 11/11, and \\Ph^^,^ = l. 
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Proof. By the Jensen inequality (jfland stationarity (stat.) one lias 



xeD xeD \yeD 



If p = oo then 



\\pf\L = sup \pf{x)\ < sup J2 \fiy)\pi^^y) - ll/lloo • 

xeD ^eD^g^ 

Since ||P/||p < WfWp and Puq = -"o with ||wo||p = 1 we have \\P\\e^^i^ ==1- □ 

Let us briefly explain how to quantify the difference of two distributions. For any measure 
v let 

(2 \ 
xeD 

If 1/ is a probability measure on D, then the quantity ||^ - is related to the -contrast, 
defined as follows. 

Definition 2.7 (x^ -contrast). The -contrast of a distribution i/ and a positive distribution 
H is defined by 



,2/ , ,,N _ V- iHx)-Kx))'^ 



xeD ' 



The x^-contrast is not symmetric and therefore no distance. By a simple calculation one 
obtains 

2 

X^(j^,7r) = 



^-1 

TT 



2 



The functional S can be interpreted as operator which maps into the constant functions, 
consequently one can see 

S = {Tr{y))x,yeD 

as a matrix. Furthermore let 

/3 = maxj^i, |/3|z)|-i|} 

be the second largest absolute value of the eigenvalues of P. Now we state a property 
of the matrix P - S. 

Lemma 2.8. Let P be a reversible transition matrix witli respect to tt. Tlien 

\\P''-Sh,^,.,^no.,o=P\ neN. (2.2) 



^ Let (D, Z), fi) be a probability space. For any convex function h : R K and for any function / : D —>-R that 
is integrable with respect to ^i, the Jensen inequality is /d/x) < Jj^{ho /) d/i. 
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Proof. The self-adjointness of P implies H-Pll^o^^o = max{/3i, |/3|£,|_i|} = p, conse- 
quently ||P"||^o^fO = By 

\\P"-S\\e,^,.^ sup \\{P"-S)f\\,= sup \\P^{f-S{fm, 

Il/ll2<l Il/ll2<l 

< sup ||P".9||2 = ma^,o 

Ilsll2<l. S(ff)=0 



and 



iniio-^io^ sup \\P"9h= sup ||P"5-^(5)ll2 

||3||2<1, S(3)=0 \\g\\^<l, S{g)=0 



< sup ||(P"-^)/||2-||P"-5|| 

Il/ll2<l 



claim (|2.2t is shown. 



□ 



This section is finished by stating a well known fact which shows that i^P'^ converges to 
TT for increasing k exponentially fast if /3 < 1. 

Corollary 2.9. Let P be a transition matrix and v a probability measure on D. Let P be 
reversible with respect to n. Then 



, keN. 







V 


1 




1 


TT 


2 


TT 



Proof. Proof. The assertion is proven by 



J^pk 












V 


1 










1 


TT 


2 (rev.) 




2 




2 


TT 



□ 



2.2. Error bounds 



in this section explicit error bounds are proven. Let us repeat the idea of Markov chain 
Monte Carlo. Suppose we have a Markov chain (X„)„gN with transition matrix P and 
initial distribution v, where tt is a stationary distribution, and we compute 



1 " 

Sn,noif) = - f{Xj+no) 



as approximation for S{f) = J2xeD f(^) ^(^)- The error is measured in the mean square 
sense, i.e. 

Now let us present a helpful result. 

Lemma 2.10. Let (X„)„gN be a Markov chain with transition matrix P and initial distribu- 
tion V. Then fori,j eN with j < i it follows that 



xeD 



Moreover, ifn is a stationary distribution and V — IT then 

E.,P[/(X0/(X,)] = (/,P'-^/). 



(2.3) 



(2.4) 
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Proof. The calculation 



xiED XieD 

= ^ ••• ^ f{Xj)P'-'^ f{Xj)p{Xj^l,Xj)---p{xi,X2)v{xi) 

xieD xjeD 
xeD 

proves (12.31 and by using ttP{x) = n{x) one can see (|2.4t . 



□ 



In the following a special case of the method 5„^„o is considered. In this case the initial 
distribution is a stationary one, thus, the distribution after a single transition does not 
change. Hence it is not necessary to do any burn-in, i.e. no = 0. Afterwards the error 
representation of the special case is set in relation to the error where the initial distribution 
might differ from a stationary one. The techniques which are used are adapted from 
lRud09] and IRudlOl . 

In the following Sn,a is always denoted as 5„. Let us start with a result stated in IBD061 
Proposition 2.1 , p. 3]. 

Proposition 2.1 1 . Let / e and let {Xn)neN be a Markov chain with transition matrix 
P and initial distribution vr. Let P be reversible with respect to n. Then 



\D\-1 



k=l 



where 



n(l-/32)-2/3fe(l-/3^) 



(2.5) 



ak^{f,uk) and W{n,Pk) 



Proof. Let us consider g ^ f ~ S{f) e K^. The error obeys 



1 " 



i=i 



n— 1 n 



-^E„,p[,g(X,f] + -^ E.^p[5(X,).g(X,)]. 



j = l i=j+l 



By using the orthonormal basis {uq, ui . . . , U|z)|-i} we have 5(2;) = X^lSi ^ akUk{x). For 
j < i one obtains 

\D\-1\D\-1 

k=l 1=1 
\D\-1\D\-1 



IS] 



fc=i (=1 



fc • 



k=l 1=1 



k=l 
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The last two equalities follow from the orthonormality of the basis of the eigenvectors. 
Altogether this gives 



\D\-1 



j^z z — / 

fe=l 



71 — 1 n 



k=l 



n + 2 



{n-l)h-nf3l + Pl 



n+l 



\D\-1 



(1 - Pk? 

Let us consider W{n,Pk) to simplify and interpret Proposition 12.1 11 
Lemma 2.12. For allneN and k e {1, . . . ,\D\ - 1} it follows that 

2n 



= A E ^lW{n,Pk). □ 



k=l 



Win,l3k)<Win,l3i)< 



(2.6) 



Proof. We will show that the mapping x W{n,x) is increasing on [-1,1), so that 

W{n, (3k) < W{n, ^i). For i e {0, . . . , n} we have 



This implies 



(l-x')x"-* <l-x' 



+ x' <l + x" 



and 



Now 



x^ + x^+^ + x"-^-! + < 2(1 + x"), J e {0, . . . , n - 1} 



n— 1 - n— 1 

(1 + x) ^ x^' = - + ^'^^ + ^"'^'^ + a;""^') < ^(1 + a;"). 



, (1 + x)E;Co^^-"(1 + ^") 



and the first inequality of the assertion is proven. By 



{ n{l+x)-2xn ^ 2n 



xe [-1,0], 
xe (0,1), 



everything is shown. 



□ 



An explicit formula of the error is established if the initial state is chosen by a stationary 
distribution. Let us consider the maximal error of S'„ for / which satisfy WfW^ < 1. 

Corollary 2.13. Let (X„)„gN be Markov chain with transition matrix P and initial distribu- 
tion TT. Let P be reversible with respect to n. Then 



< 



(2.7) 



Proof The individual error of / is 

\D\-1 



1 " II f II'' 

e.(5„,/)2 = - ^ a^m'^,/3fe)<^^ max VF(n,/3fe) 

1231 k=l,...AD\-l 



2 



\D\- 



n(l-/3i) 



n2(l-/3i)2 
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where is chosen as in Proposition 12.1 H and therefore I]L='r^'^fc - II/II2- ^'^^ 
preceding analysis of the individual error we have for ||/||2 < 1 the right upper error 
bound. Now we consider / = ui, where ||mi||2 = 1- By applying (12. 5t we have 



e7r(S'„,Ui)^ 



l + /3i 2^1 -p^) 



7i(l-/3i) n2(l-/3i)2- 
Thus the equality of (|2.7t is proven and by Il2.6t the inequality is shown. 



□ 



In Corollary[2J3]an explicit error bound with respect to II II2 is shown. Notice that the first 
part of l|2.7> is an equality, which means that the integration error is completely known if 
the initial distribution is stationary. 

Suppose that the distribution n can be simulated directly, i.e. we can apply a Monte 
Carlo method with an i.i.d. sample. Then an i.i.d. sequence {Xn)neN, where every X„ 
is distributed with respect to n, is a Markov chain with transition matrix S = {iriy) )x,yeD 
and initial distribution n. In this setting one has 

eASnJ)' = -\\.f-S{f)\\l. 
n 

This corresponds to /3j = for all i > 0. In some artificial cases other Markov chain Monte 
Carlo methods can do better. For example if there is a Markov chain where (3i < and 
the target is to approximate S{ui) or if all eigenvalues are smaller than zero. A simple 
transition matrix which satisfies this eigenvalue condition is given by 



/ 



P 



\D\-1 





|D|-1 



\D\-1 



\D\ 



\D\-1 

J 



where Tr{x) = l/\D\ for all x e D, see [FHY92, Remark 3, p. 617]. It turns out that 
/3i = • • • = = ^ \D\-i - '^''9® 1^1 unfortunately not possible to construct a 

transition matrix where (3i is close to -1. 

Proposition 2.14. Let P be an irreducible transition matrix. Tlien 

1 



Pi > 



\D\ 



1 



Proof. Since /?o = 1 one has 

|Z5|-1 

1+ E 



Then 



\D\-1 

J2 l^i^ trace (P) = ^ p{x, x) > 0. 

i=0 xeD 



\D\-1 

-l< E ^ i\D\-l)Pi. 



□ 
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The error estimates under the assumption that the initial distribution is a stationary one 
seem to be very restrictive. If we could sample tt directly we would approximate S{f ) by 
Monte Carlo with an i.i.d. sample. However, even if it is possible it might happen that the 
sampling procedure is computationally expensive. It can be reasonable to generate only 
the initial state by sampling from n and afterwards run a Markov chain with stationary dis- 
tribution TT. Perfect sampling might be helpful for the construction of such direct sampling 
procedures, see lPW96>|Hag02j . 

In the following we consider the case, where the initial distribution is not necessarily 
stationary. Let i/ be a distribution on D and fc e N. Then we define 

dk{x) = J2 44(P'(^'2/) - <y)) = ^'(-)(-^) - 1 = iP' -S)i-- l)ix), X e D. 
yen 



If P is reversible with respect to tt, then we obtain 

^pk 



\dk\\2 = 



- 1 



fc e N, 



thus dk determines the difference between vP^ and the stationary distribution tt. Addi- 
tionally by the spectral representation of P^ (see l|2.H ) one obtains 



\D\-1 \D\-1 

' V 



dkix)^ J2 /3f ^w.(y)Ky)"^(2:)= E l3-{-,u,)u,ix), xeD. (2.8) 

i=l y£D i=l 

The next statement gives a relation between e^{Sn,noif) and e7r(5'„, /). 

Proposition 2.15. Let f e and g = f - S{f). Let (X„)„gN be a Markov chain with 
transition matrix P and initial distribution v. Let P be reversible with respect to n. Then 



j=l j=l k=j + l 



where 



U{h)^{d,,h) = (J(P*-S')(^-i),/iJ>, /ieM^,ieN. 
Proof. It is easy to see that 

E.,P \S{f) - Sn^noif)? = -^EEE.,p[g(X„„+,)5(X„„+,)] 



7i~ 1 n 



Recall that reversibility with respect to tt is equivalent to self-adjointness (s-a) of P. For 
every function e and i e N the following calculation holds 

-) = {P'^h^) + lp^h^ 

TT / ^ ' \ TT 



E {P''h){x) v{x) = [P^h, -) = {P% 1) + (^P% - - 1 
- (P'/i, 1) + Ip\- - 1), /^) = (P^K 1) + ((^^' - S)C- - 1), /^ 

(s-a) ^ ' \ TT / ^ ' \ TT 

= Y,^P"h){x)T:[x) + {d,,h) . 



xeD 
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Formula i2.9\ is shown by using tine previous calculation for h = g'^ and h = gP'' ^g. □ 



Corollary 2.16. Under the same assumptions as in Proposition 12. 751 we obtain for i e 

{1, . . . , \D\ - 1} that 



l + A 2/3,(1 - /3f) , 1 " /l + /3,-2/3: 



nil-Pi) n2(l-/3,)2 n 



„2 



n-j+l 



1-A 



-'j+no 



where 



\D\-1 \D\-1 

1=1 1=1 xeD 



(2.10) 



Proof. By substituting 



1 + 2/3,(1-/3") 



n(l-/3,) n2(l-/3,)2 



and 

n— 1 n 



L,+„„(uf)(/3,-/3r^+i) 



1-A 



into l |2.9t one obtains the error formula. The equality of Lk{uf) is an implication of 



□ 



Equation | |2.9t and the result of Corollary [2.1 61 are still exact error formulas. To get an 
upper bound for the error, we estimate the functional Lk{-). This estimate depends on the 
speed of convergence of vP'' to n. 

Lemma2.17. Let h eR^ , k eN and recall that (3 = max {i3i,\i3\d\_i\}. Then 
\Lk{h)\<l3'^ 







V 




1 






Jh\\,<f3' 


1 












TT 




TT 


oo 



(2.11) 



Proof. After applying Cauchy-Schwarz inequality (CS) to Lk{h) = (dfe, h) one obtains 

V 



\U{h)\ < \\dkhM2<\\P' -s\\ 

(CS) 



1 



\h\\ 



2 • 



By Lemma |Z8] the first inequality is proven and the rest is shown by using WhW^ < 



- II 

TT 1 1 OO 



□ 



The last lemma ensures an exponential decay of Lk{-) for increasing fc e N. This fact is 
used to show that there exists a constant C^.t^jj, which is independent of n and no, such 
that 



\e^iSn,nQ,ff - e^(S'„,/)^| < C^^T^^fj 11/11 



2 ^ 

2 n2 ■ 



An immediate consequence of the inequality is an explicit error bound. The following two 
lemmas imply such an inequality and provide Cj^^^r.^ explicitly. 
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Lemma 2.18. Let (X„)„gN be a Markov chain with transition matrix P and initial distribu- 
tion V. Let P be reversible with respect to tt. Let / e and 

n n— 1 n 

j=i j=i k=j+i 

Then 



- 1 



|e.(5„,„o,/)'-e,(5„,/)2| < C/(/3,n) 
Proof. Let 5 = / - The equation ll2.9t implies 

Then by l|2.1H one gets 



(2.12) 



n— 1 n 



j=i k=j+i 



- - 1 



By the Cauchy-Schwarz inequality (CS) and ||£0_^<>( 



Ml. 

p''-^ it follows that 



Let eo 



(CS) 



- l|L^"°. Then 



j+no 

3 = 1 i=l k=j+l 

n n — 1 n 

<£o||.9|l2E/5'+2£"ll3ll2E E 

(n n— 1 n 

2 



<t/(/3,n).eo||/|i2. 
The last inequality follows from ||/ - S'(/)||2 < II/II2 



□ 



Note that one can also get a similar estimate as in i|2.12) with respect to \\f\\^ by using 
the first inequality of \2A 1 > instead of the second one. In the resulting estimate the factor 

^^11^11^ does not appear. 

Let us consider U{f5,n). If /3 < 1, then the mapping n ^ U{l3,n) is bounded. 
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Lemma 2.19. Let I3 < 1. For all n we have 
Proof. By the infinite geometric series one obtains 



From Lemma IZTS] and Lemma |2TT9] it follows that 



□ 



2/3"«^ _ ^ 



If the initial distribution is tt then the error can be represented as in Proposition 12.11! 
and bounded as in Corollary [2. 131 

The next theorem summarizes the main result of this section. 

Theorem 2.20. Let / e and let {Xn)neN be a Markov chain with transition matrix P 
and initial distribution v. Let P be reversible with respect to tt and let /3 <l. Then 



Forak = {f,Uk) one has 

lim n ■ e^{Sn.no, ^ 1™ n • e^(S'„, /)^ = — . (2.14) 

fc=l ^ 

Proof. By L emm a lZTSl Coroll ary IZTSl and Lemma l2T9]the e stimate of (IZTSt is proven. 
By Lemma [2T8I and Lemma I27T9] the first equality of i|2.14| i holds. Then, by Proposi- 
tion [OT] 

lim n-e^{Snjf= V □ 

n-!-oo ^ l~ Pk 

Remark 2.21. The error bound (12.1 3t can be interpreted as follows: The burn-in ?io is 
necessary to eliminate the influence of the initial distribution v, while n must be large 
to decrease e^(5„,/). Unfortunately the dependence of the initial distribution on the 
estimate is disillusioning for an extension to general state spaces, because of the factor 

One can avoid this factor if one considers error bounds with respect to 

with p> 2, see Section 

Another consequence of Lemma lzTsl and Lemma l2T9] is the following result concerning 
the asymptotic error for \\f\\^ < 1. 

Corollary 2.22. Under the same assumptions as in Theorem [2.2CM t follows that 



2/3"%/ - I - - 1 



l2 H ru2 



lim n ■ sup e^{Sn,no,f) 



2 _ ! + /?! 
Il/ll2<l ^-Pl 

and 



hm sup e^(Sn,noJ) ^ FT WyI' 

ll/ll2<i n(l-/3i) n^{l-piY 
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Proof. Let us define 

2/3"%A^ I- - l|lo 

^ V TT OO I TT 112 



oo 



One hasJim„^oon- • c„,„ and lim„o^oo c„,„o = 0. For ||/||2 < 1 we obtain by 
Lemma IZTSl and Lemma |2T9] tliat 



Hence 



sup e^(5„,/)^ - c„,„o < sup e^{Sn,„o, f)^ < sup e^(S'„, /)^ + c„,„o. (2.15) 

Il/ll2<l i!/ll2<l Il/ll2<l 



2_ l + A 2/3i(l-/3n 
7i(l~/3i) n2(l-/3i)2' 



By Corollary [2. 13| we have 

sup e^{SnJ) 

Il/ll2<l 

By taking the limits in (12.1 5t the assertions are proven. □ 
Remark 2.23. The number 

is called an autocorrelation time of P, see [Sok97| IMat99l . If one could sample from n 
then /?! = so that t = 1. Hence t is the factor of computing time which quantifies 
the asymptotic difference of Markov chain Monte Carlo compared to Monte Carlo with an 
i.i.d. sample from the distribution n. 

Remark 2.24. Observe that one obtains from (|2.15t a lower error bound for Sn,no- We 
have 

with Cn,n„ defined as in the proof of Corollary 12.221 For a reasonable burn-in of the 
Markov chain the error can be effectively approximated by these estimates. We apply 
these estimates to illustrating examples, see Section |Z4] Now let us discuss which burn- 
in is reasonable. 



2.3. Burn-in 

Assume that computer resources for N steps of the Markov chain are available, i.e. N = 
n + no- The goal is to choose the burn-in uq and the number n such that the upper error 
bound is as small as possible. There is obviously a trade-off between the choice of n and 
no. In the next statement the error for an explicitly given burn-in is stated. 

Theorem 2.25. Suppose tliat 



no 



log( 



TT OO 



log(/3-i) 



,0 
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Then 



< sup eu{Sn,najY < 



,<1 



n{l-l3i) n2(l-/3)2 



Proof. The assertion follows from Theorem |2.20| and Remark l2.24l □ 

Note that log(/3-i) = (1 - /3) + ^^tt^ and log(^-i) > 1 - /3. One might use this 
observation to estimate the suggested burn-in. The choice of the burn-in of Theorem|Z25] 
is justified by the following. 
Let us define 





1 






/ 










TT 


oo 


TT 



c 



and assume that Pi = p. If the assumption does not hold we may estimate the error 
bound of Theorem [2120] by using (1 - Pi)-^ < (1 - PY^- For ||/||2 < 1 we want to 
minimize the error estimate 



est(n,no) = 



^ n(l-/3) ?i2(l-/?)2 
Lemma 2.26. For?? > let 



under the constraint that A^ = n + no. 



C > 



1-/3 



^ > + i^i^ + 2[iog(r^) - (1 - P)]-'- 



(2.16) 
(2.17) 



Then there exists an 



log(g) , ^ log(g) 

log(/3-i)'^ "^"^^logl/?-!) 



i/i/'/?/c/? minimizes the mapping no i-> est(A^ -no, no). 

If 77 = 10-3, then (IZTet implies for /3 = 0.99 that C > 152 and for C = lO^o that /3 > 0.87. 
Hence the assumptions are not restrictive, since p is usually close to 1, C is larg^ and 
the computational resources N should be sufficiently large. 



Proof. Let 



a = iV- (1 + 77) 



log(C) 



and N 



log(C) 



'log(ri) log(/?-i)' 

Note that l|2.17t gives that 6 > a > 0. It is enough to show that there exists an mopt e [a, b] 
which minimizes n 1-^ est^(7i) given by 



We have 



ostein) = (est(n,iV-n))2 



as^(„)' = ias.^„) = ;^ 



+ 



2Cp 



N-n 



n{l-P) ?l2(l-/3)2 



(1-/3) 



iog(r ') - - - 1 

n 



^The constant C might depe nd exponentially on additional parameters, see the example "Random walk on the 
hypercube" in Section [?!4l or see Sectionlj] 
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We will show for any a < a and b>b that 

est^(a)' < and est^(6)' > 0, 

consequently there exists an mopt € [a,b] which minimizes est^(n). Let b > b. Then the 
inequality est^(6)' > follows by (j2.17t and 



N > 



log(C) (log^C) + ( 



1 - 



1-/3 
log(/3-i 



N > 



(l-I5iP%)l»g(g)+2 
log(/3-i)-(l-/3) 

iV(log(r')-(l-/3))+log(C) 

log(/3-i) 
61og(r')-^'(l-/3) >2 

log(r - ? > (1 - /3) 



log(r')- pv- 



1-/9 
log(/3-i) 

log(C) 



log(/3~i) 



- Ij > 2 

(1-/3) > 2 



1 



1-/3 
l-f3 



log(r') - 



C/3 



N-b 



1-/3 
1 > 0. 



los 



1 > 



On the other hand for a < a we obtain est^(a)' < 0. This is shown by the following 
calculation. By | |2.16t one has 

log(/3-^) 

ini (1-/3) 
^ iog(r')-(i-/?)c" <o 

=^ log(/3-i) - (1 - /3)C'' < - 



\ogir')--<{i-m' 

a 



(W) 



log(r')- 



- 1 = 



-1- ^ 



(1-/3) 



log(/3-^) - 



- 1 < 



log(/3-i) - ^ - 1< 0. 
a ,' 



(1-/3) 

Altogether this implies that there is an 



i°g(g) n I 

log(ri)'^ ^'^^logCri) 



which minimizes the mapping uq ^ es\.{N - no, no)- 



□ 



If an error of at most e e (0, 1) is desired, then the suggested choice of the burn-in no is 
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independent of the precision e, we cinoose 



no ~ max 




and 



n > 



1 + VI + 4e2 



to aclnieve 



2.4. Examples 

Tine goal is to compare tine upper bounds of Tiieorem [2.201 and Tiieorem [2.25| witli tine 
exact error for a given function / e K-^. It is not known which / with ||/||2 < 1 maximizes 
e:.{S„.naJf- But by Corollary|222one has 



where ui is the eigenfunction corresponding to /3i. This motivates the study of the indi- 
vidual error for ui, which gives the maximal error for integrands / with ||/||2 < 1 if no goes 
to infinity. In this section illustrating examples are considered, where the eigenvalues and 
the eigenfunctions are available. The Markov chains are very well studied in the literature, 
see iMei99. ,SC04. ,Str05. ,BD06. ,LPW09j . 

Random walk on a circle 

Let T > 3 be an odd natural number. Let D = Zt be the underlying state space, where 
Zt = Z mod T denotes the cyclic group of order T. The T x T transition matrix of the 
random walk is determined by 



The transition matrix is reversible with respect to the uniform distribution given by Tr{x) = 
1/Tfor X £ D. Since T is an odd number we obtain that the transition matrix is aperiodic, 
for even T it would be periodic. The eigenvalues of the transition matrix are 



lim sup ei.(S'„,„o,/)^ = e^(S'„,Mi)^, 

■°^°°lt/ll2<l 




y = X ±1 mod T, 

otherwise. 




T -1 



2 



and the orthonormal eigenfunctions {uq, ui,. . ., ut^i} are 




where j = 1,...,^ 



and X eD. Clearly /3 = \I3t-i\ = cos(f ), thus /3 7^ /3i. 
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Let us consider / = ui. The initial distribution is cinosen asv = 5^, so tinat tine initial state 
is G £). By = uo + it is 



(wi, {uif) = {ui,uo) + (wj, Us) ^ l^, « = 3, 



1, ^ = 0, 

0, Otherwise. 



T-l 



Hence by (12.1 Ot we obtain 

z=l 

Additionally with Pi = cos(^) it is 

l + cos(f) 2cos(f )(l-cos"(^)) 

The exact error is determined by Corollary [2. ISI with = cos(^) so that 



(S'n, Ml) 



n^-^V 1 — COS! 41^) / J / 

We apply Theorem [2.251 to get a lower error bound and (12.1 3t of Theorem [2.201 to get 
an upper error bound, since /3 ^ Pi. Hence the burn-in is chosen as suggested in 
Theorem lZ25] i.e. 

■1 iog(T2-r) 



no 



21og(cos-i(f)) 



Then 



and 



1/2 



l + cos(^) 4 



ui). (2.20) 



2LAA2 / — '^i'\^n,no^ 



l(l-COs(f)) 7l2(l-COs(f))2 

We have an explicit exact error formula (12.1 8t , a lower error bound (|2.20t and an upper 
error bound (12.1 9t . 

In FigureETIthe different bounds of (i2T9t , (IZ20t and the exact error of dZTSt are plotted 
for T = 999. The curves start at iV = no, since the computational resources must be 
larger than the burn-in no = 1396699. The lower error bound gives a non-trivial estimate 

if iV > no + 1617911 = 3014610, since for n > (^i+^p^^^l'} = 1617911 one obtains a lower 
bound larger as zero. 
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lower bound (2.20) 

— exact error (2.18) 
- - upper bound (2.19) 



10 



N = no+n 



10 



Figure 2.1 .: Random walk on a circle: Exact error and error bounds, T 

"i log(r^-r) 

2 log(cos-i(f )) 



999 and 



no = 



= 1396699. 



Random walk on the hypercube 



Let d be a natural number. Let D = {0, 1} be the state space and |i| = J2i=i \^^\ foi" 
i e {-1, 0, 1}''. The 2'' X 2*^ transition matrix is given by 



x = y, 



p{x,y) 



f 1 

2' 



1, 



10, 



otherwise. 



The transition matrix is reversible with respect to tt{x) = 2 ior x e D. Furthermore, it 
is aperiodic and irreducible. We use a different notation for the index of the eigenvalues 
and orthonormal eigenfunctions, for z e {0, 1}'' one has 

= 1 - -i^ and u,ix) = (-1)^-1 ^'^S xgD. 

Set [0] = (0, . . . , 0) and set [1] = (1, 0, . . . , 0) so that 

/3[o] = 1, and U[o](a;) = 1, x G D, 

/3[i] = l-^, and U[i](a;) = (-l)^S x G D. 



Obviously for all indizes z e {0, 1} the eigenvalue >0so that /3[i] = /3. 
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Let us choose the initial state of the Markov chain deterministically in (0, . . . , 0) e D, i.e. 

v = S[o]- By = u[o] one has for index z e {0, l}'^ that 



1, z = [0], 
0, otherwise. 



This implies 



Lfc((u[i])2) =0, fceN 
The error of S„ , if the initial state is chosen by tt, obeys 

2_2d-l 2{(f-d) 



1-1- 



e7r(5'„,W[i]) 
Then by Corollary IzTs] it is 

e,y(S'n,no, U[l]) = e7r(S'„, 

The burn-in and the error bounds are determined by Theorem |2.25l One obtains 

■ 1 log(22'* - 2^^) " 



(2.21) 



no 



2 log(l - 



such that 



and 



2d 2d2 

e^[bn.no,U[l]) < y — H 5", 



2d-l 4d2 



(2.22) 



(2.23) 



n 

In Fiaure lZ2] for d = 50 the exact error (|2.21 > , the upper error bound (j2.22| and the lower 
error bound ll2.23l i are plotted. It can be seen that after the burn-in the curves are close 
to each other. The error bounds are polynomial in d which is of the magnitude of logdZ?!). 



Random walk on the star 

Let T > 2 be an even natural number. Let the state space D = {0,1,..., T}. The 
(T + 1) X (T + 1) transition matrix is given by 




x^O, yeD\{0}, 
xeD\{0}, y = 0, 
Otherwise, 



with a parameter 6 e (0,1). The transition graph is star shaped since every state is 
connected solely with the center 0. The transition matrix is reversible with respect to tt, 
for x e Z? given by 

= I 2^6' 3; = 0, 

1 T(2-e) ' otherwise. 
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10 



I 



10 



10 




lower bound (2.23) 

exact error (2.21) 

upper bound (2.22) 



N = no+n 



10 



Figure 2.2.: Random walk on the hypercube: Exact error and error bounds, d = 50 and 

" 1 log(2^^--2^" 
2 log(l-A)-i 



no = 



= 1716. 



One obtains /3o = 1, /3t = 6* - 1 and for a; e -D one has 



uo{x) = 1, ut{x) = \/l — 6 ■ 



1, x = 0, 

othenwise. 



The eigenvalue ft = for i e {l, ... ,T - 1} is of multiplicity T - 1. Without loss of 
generality we may assume that for any x € D one has 



ui{x) 



0, a; = 0, 

i_g ) x = 1, . . . , r/2, 

x = T/2 + l,...,T. 



The remaining eigenvectors U2, . . . ,ut-i are arbitrarily chosen such that we get an or- 
thonormal basis {uo,«i, • ■ • .ut}. One has an aperiodic and irreducible transition matrix 
where /3i = and /5 = max{/3i, |/3t|} = 1-6. We consider the error for / = m. The initial 
state is given as the center of the star, i.e. 0. Then u = So- From (ui)^ = uo 
one gets 



Ut 



1, 



0, 



0, othenwise. 
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By i lzTOt this implies 



1=1 

Tine error winere tine Markov cinain is initialized by the stationary distribution obeys 



e^(5„,ui)^ = -. 

n 

Then by Corollary [2T6] it follows that 

e.(5„,„o,«i) = j • (2.24) 

Recall that f3i ^ (3. However, we only use the error bounds of Theorem |2.25l The burn-in 
is chosen as 

^\og{{2 -e)Ty 

no - 

Then the upper bound is 



21og(l-6i)-i 



2 2 

e.(5„,„o,"i) < \/^ + ^, (2.25) 



and the lower bound is given as 



1 4 

'oT^ - ei'('S'«,no,wi). (2.26) 

In Figure|Z3]for 6 = .1 and T = 10^ the exact error l|Z24) , the upper error bound l |Z25t 
and the lower bound l l2.26t are plotted. For n > ^ we get a nontrivial estimate by the 
lower bound. The upper error bound is shifted down since p ^ (3i. One could improve this 
by using ll2.13t of Theorem r2.20l directlv. In the present setting one looses asymptotically 
a factor of 



Let us summarize the important facts of this section. The error was considered for the 
eigenfunction ui corresponding to /3i. If no goes to infinity, then ui is the function which 
maximizes the error for integrands / with ||/||2 < 1. The bound of Theorem l2.25l applied 
in this setting gives tight results if = (3. Otherwise Theorem [2.201 achieves the right 
asymptotic coefficient if (3i and (3 are known. For the considered examples one has 
the eigenvalues and the eigenfunctions explicitly. In applications it is usually difficult to 
estimate /3i or /3, but there are different auxiliary tools, e.g. canonical path technique, 
conductance (see [JS89] and [DS91]), log-Sobolev inequalities and path coupling, see 
ILPW091 . However, if the eigenvalues Pi and I3\d\-i are available, then the error can be 
approximated by the lower and upper bound. 



2.5. Notes and remarks 

Let us comment how the results fit into the published literature. An elementary and pow- 
erful technique how to bound the error for Sn,no or Sn is based on Doeblin's theory, see 
IStrOSI pp. 27]. Let Ak = {ak{x,y))x,y(^D be the Mh Cesaro sum given by 

fc-i 

3=0 
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10" 



10" 



10" 



10 



lower bound (2.26) 

— exact error (2.24) 
- - upper bound (2.25) 



10 



10-^ 

N = nQ +n 



10 



Figure 2.3.: Random walk on the star: Exact error and error bounds, 6 = 0.1, T = 10^ 



and no 



log((2~9)T) 
21og(l-e)-i 



58. 



Assume that 

3M eN\ {1} , t/o e and 7 > such that \fx e D -. auix, yo) > i- (2.27) 
Then for any no the error obeys 

e.(5„,„„,/)^<^^^^||/||L. 

717 

Condition (j2.27t states that there is a state yo where the expected value of visiting it, in 
average, until M from every other state is uniformly bounded from below by rate 7. If the 
transition matrix is irreducible then there exists an M such that Am > and one has that 
l|2.27|i is satisfied (see for example [BehOO, Lemma 7.3, p. 50]). It is difficult to obtain 
7 and M. Let us state a toy example where one can compute 7 and M explicitely. Let 
D = {0, 1}''. We consider a Markov chain, which independently samples with respect to 
TT, with n{x) = 2^'^ for X e D. This is Monte Carlo with an i.i.d. sample. Consequently we 
get as best possible parameters 7 = 2"'^^^ and M = 2. The error estimate behaves ex- 
ponentially bad in terms of d. In contrast, the estimate of Theorem |Z20] is independent of 
d. In general, even if one can get 7 and M, then these constants are often exponentially 
bad in terms of some other parameters. Usually 7 is close to zero and M is huge. How- 
ever, with this bound even the periodic case is covered and reversibility is not necessary. 
But on the other hand the optimal coefficient of the leading term of Corollary [2. 13l is 
not reached and the burn-in cannot be used to tune the algorithm. 
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The approach to use the spectral representation of reversible transition matrices is not 
new. In [BD06] the result of Proposition 12.11 l is presented. By the same arguments a 
slightly worse bound is shown in lAld87l Proposition 4.1 , p. 40]. It applies if /3i > and 
gives 

eASnJ? < 11/11^ + ml ■ (2.28) 

Furthermore if the initial distribution i' is not the stationary one, a different algorithm is 
considered. Namely, the burn-in Uq is randomly chosen, independent of (X„)„gN, by the 
Poisson distribution with parameter no, and 



1 " 

^:.no(/)--E/(^^+»s)- 

Then it is proven in IAId87l Proposition 4.2, p. 41] that 





1 














CXD 



exp{-no(l - 



e.(5:,„„,/f<e.(5„,/)2 

This bound applies also for periodic Markov chains and after applying (|2.28t it gives an 
estimate with respect to II II2. The optimal coefficient of the leading term, see Corol- 
Iarv l2.22l is not reached, also if Corollary[2J3]instead of (I2.28t is applied. The burn-in hq 
is randomly chosen rather than deterministically, since then one can translate the discrete 
time Markov chain into a continuous time Markov chain and avoids discussions of neg- 
ative eigenvalues. This technique is similar to the idea of considering a lazy Markov chain. 

In INP091 an explicit error bound is published which holds also for non-reversible Markov 
chains with an absolute ^2-spectral gap, i.e. (3 = ||P||^o^^o < 1. in the proof of the 
error bound the multiplicative reversibilization PP* of P is used, where P* is the adjoint 
operator of P acting on £2- It follows from [NP09, Corollary 4.2. p. 320] that 

e (S f? < ^ + ^ II f 11^ + II f 11^ + + 

e.l^„,„o,;j < II/II2+ (i_^)2„2 II/II2+ (l_/3)„2 ll/lloo 11/112- 

111 /2 

One obtains an error bound uniformly with respect to WfW^ by using \\f\\^ < \\-^\\^ \\f\\2- 
The spectral gap can be implied by aperiodicity and irreducibility of the Markov chain, see 
[LPW09I Lemma 12.1 , p. 153]. But it is remarkable that the chain can be non-reversible. 
If ^ = /3i then the error bound has the right coefficient of the leading term. Then it is 
essentially the same bound as in Theorem r2.20l 

Also confidence estimates of S'„,„„ are of interest. The goal is to achieve for given preci- 
sion e e (0, 1) and confidence parameter a e (0, 1) that 

PT{\Sn,no{.f)-S{f)\>e)<a. (2.29) 

Such approximations for confidence intervals can be implied by the mean square error. 

Lemma 2.27. Let (X„)„gN be a Markov chain with transition matrix P and initial distribu- 
tion V and let e e (0, 1). Then 

Pr(|5„,„„(/) - S{f)\ >e)< "-^^^"'r'^^' . 
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Proof. The result is an application of the Markov inequality. □ 

Suppose that II/II2 < l. If one applies Lemma \227\ and the burn-in is chosen as in 
Theorem l2.25l then it follows for 



no > 



and n > 



log(ri) - 1-^ 



that | |2.29t is true. Note that the burn-in is chosen independently of a. In ILPW09I Theo- 
rem 12.19, p. 165] a similar bound is deduced by coupling arguments. It implies a slightly 
worse result if the initial state is deterministically chosen. If 

log(2a-i||i||^) ^4a-ie-2 
no> ^-p^ and n>^-^ 

then 1I2.29I1 is true. The main difference is the dependence of a in the choice of the burn- 
in. One can essentially boost this confidence level by using a median of independent 
runs of the Markov chain Monte Carlo method. This is explained in [NP09]. 

However, both presented results are far away from well known Chernoff bounds. These 
exponential inequalities for finite Markov chains are shown in [Gil98] for random walks 
on graphs. In [Lez98J, this Chernoff bound was extended and refined for Markov chains 
on finite and general state spaces, furthermore for discrete and continuous time. For 
irreducible and reversible Markov chains on finite D and \\f\\^ < l one obtains from 
|Lez981 Theorem 1 .1 , p. 850] that 



Pr(|5„.„„(/)-5(/)| >e) <3 
In other words, if 











TT 





12 



(2.30) 



-1^ 



"■0 > ' and n > 



log(ri) - 1-A 

then ll2.29t holds true. This is better than using Lemma lZ27] In ILP041 Hoeffding bounds 
for reversible Markov chains are presented. 

Such exponential inequalities also imply an error bound of the mean square error by the 
following well known formula, see for example f Kal021 Lemma 2.4, p. 26]. 

Lemma 2.28. Let (X„)„gN be a Markov chain with transition matrix P and initiai distribu- 
tion V. Then 

eASn,no,f? = r Pr{\Sn^r.„{f) - S{f)\ > V^)de. 







By Lemma |Z28] and by (|2.30t one obtains the following error bound 

^ 36(1 + /?"'' II - lIL) 
sup e.{Sn.noJ?<— ■ 

The asymptotic coefficient as described in Corollary l2.13l and Corollarv l2.22l is not reached. 
However, the error bound applies also for periodic Markov chains. 
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Let us provide a conclusion. Different explicit error bounds for finite state spaces are 
known. The results presented in Section |Z2] are not entirely new. In the literature one 
can find similar estimates where some of the assumptions like aperiodicity or reversibility 
are weakened. The justification and discussion of the burn-in in Section IZ3l and the 
lower bound of Theorem l2.25l seem to be new. In the following we will extend the results 
to general state spaces. 
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3. General state spaces 



In the following we study the mean square error of Markov chain Monte Carlo methods on 
general state spaces. The state space can be countable or uncountable. In Section [Ol 
we provide the basic definitions and properties of Markov chains on general state spaces. 
The estimates of the mean square error are shown in Section |3?2] We suggest and justify 
a recipe how to choose the burn-in in Section 1331 Afterwards the error bound is applied to 
illustrating examples and finally we discuss how the results fit into the published literature. 

3.1. Markov chains 

in this section facts and definitions of Markov chains on general state spaces are stated. 
The paper [RR04] of Rosenthal and Roberts surveys various results about Markov chains 
on general state spaces. For further reading we refer to | Rev84, Num84, MT09I . 

Let (£>, D) be a measurable space. In most examples D is contained in R'^ and D is given 
by B{D), where B{D) denotes the Borel cr-algebra over D. In the following we provide 
the definition of a transition kernel and a Markov chain. 

Definition 3.1 (Markov kernel, transition kernel). The function K -. DxTi ^ [0,l]\s called 
a Markov kernel or a transition kernel if 

(i) for each x e D the mapping A eT) ^ K{x,A) is a probability measure on (-D,D), 

(ii) for each ^ e D the mapping x e D K{x,A) is a S-measurable real-valued 
function. 

Definition 3.2 (Markov chain). A sequence of random variables (X„)„gN on a probability 
space (O, J", Pr) mapping into {D,T)) is called a Markov chain with transition kernel K if 
for all n e N and AeD one has 

Pr(X„+i eA\Xi,.. . ,X„) = Pr(X„+i eA\Xn) = K{Xn,A) almost surely. 

The distribution 

v{A) = Fr{Xi eA), AeD 

is called the initial distribution. 

Suppose that we have a transition kernel K and a probability measure ly. For simplicity 
let us assume that L» c K'* and £> = B{D). For any transition kernel there exists a 
random mapping representation, see for example Kallenberg [Kal02, Lemma 2.22, p. 34]. 
A random mapping representation is a measurable function $: £) x [0, 1] ^ £>, which 
satisfies 

Pt{^{x,Z) e A) K{x,A), xeD,AeD, 

where the random variable Z (r^, J", Pr) ([0, 1],S([0, 1])) is uniformly distributed. Then 
a Markov chain can be constructed as follows. Let (Z„)„gN, with Z„ : (f2,J", Pr) 
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([0, 1],;B([0, 1])), be a sequence of i.i.d. random variables with uniform distribution, and 
assume tinat Xi has distribution v, then one can see tinat (X„)„gN defined by 

is a Markov cinain witin transition kernel K and initial distribution i^. 

The transition kernel K a Markov chain describes the probability of getting from state 
x e £> to A e 2) in one step, i.e. for all fc e N one has 

K{x,A) = FT{Xk+i e A \ Xk = x). 

The n step transition kernel is inductively given by 

K^x,A)^ [ K''-\y,A)K{xAv)^ I Kiy, A) K"-\x,dy). 

J D JD 

The first equality of the previously stated equation is the definition and for a proof of the 
second equality see fRev84 Proposition 1 .6, p. 1 1] or IMT09I Theorem 3.4.2, p. 61]. The 
function A'" again constitutes a transition kernel. The n step transition probability from 
state X e Dto AeDls 

Pr(Xfe+„ eA\Xk = x) = K^{x, A). 
This is seen by integrating over the conditional distribution of the previous step: 

VY{Xk+i ^ A \ Xk = x) = K{x,A), 

Pr(Xfe+2 ^A\Xk=x)= [ Pr{Xk+2 e A \ Xk+i =y,Xk = x)Pi{Xk+i e dy \ Xk = x) 

JD 

= I Pr(Xfe+2 e A I Xk+i = y) K{x, dy) = K^{x, A), 

JD 

Vv{Xk+n e A I Xfc = X) = / Vl{Xk+n e A I Xk+n-l =y,Xk= x) 

JD 

X Vi{Xk+n-i (^dy\Xk^x) 
= / Vi{Xk+n e A I Xk+n-l = y) K'^^^x, dy) = K"ix, A). 

JD 

In the following let us assume that we have a Markov chain {Xn)neN with transition ker- 
nel K and initial distribution v. The expectation E,yj^ is taken with respect to the joint 
distribution of {Xn)neN, say W^^k, which is defined on {D^,a{A)) where 

= {uj = (xi,x2,...) I X, e £» for all i > 1} and 

[J {Ai X A2 X ■ ■ ■ X Ak X D X ■ ■ ■ \ A, e T), i = 1, . . . , fc} , 

feGN 

see [MT09', Theorem 3.4.1, p. 60] or IRev841 Theorem 2.8, p. 17]. For fc e None has with 

Ai X ■ ■ ■ X Ak c D'' that 

W,,k{Ai x---xAkxDx---)= Pr{Xi e Ai, . . . e Ak) 

= /" [ ■■ [ K{xk^i,Ak)K{xk-2,dxk-i)-.-K{xi,dx2)vidxi). ^'^'"'^ 

JAi JA2 JAk-i 
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Now we present properties of transition kernels. These properties have finite state space 
counterparts, see Section IZTl 

By M{D) denote the set of real-valued signed measure^] on (D,!)). For any 1/ e M{D) 
let us define 

Jd 

Note that the mapping u ^ vP"' defines a linear operator on M{D). If is a probability 
measure then vP™ is the distribution of X^+i for a Markov chain {Xn)nm with transition 
kernel K and initial distribution v. 

Definition 3.3 (stationarity). Let tt be a probability measure on (D,£>). Then tt is called 
a stationary distribution of a transition kernel K if 

ttP{A)^tt{A), yle£t. 

Roughly spoken that means: Choosing the initial state with respect to a stationary distri- 
bution TT, then, after a single transitions the same distribution as before arises, i.e. 

Pr(Xi eA) = tt{A) = ttP{A) = Vt{X2 e A), Ae D. 

Definition 3.4 (reversibility). Let tt be a probability measure on {D, D). A transition kernel 
K is called reversible with respect to tt if 

/ K{x,A)T:{dx) = I K{x,B) n{dx), A,BeD. 
Jb J a 

If a transition kernel K is reversible with respect to a distribution tt, then tt is a stationary 
distribution of K. If the initial distribution of a Markov chain with transition kernel K is tt, 
then reversibility with respect to n is equivalent to 

Pi{Xi eA,X2eB) = Pr(Xi eB,X2e A), ^, S e D. 

A Markov chain is called reversible with respect to tt, if the corresponding transition kernel 
is reversible with respect to tt. 

Definition 3.5 (lazy version). Let K be a transition kernel and let 1^(3;) be the indicator 
function of A e D for x e £>. Then we call 

K{x,A) ^]^{1a{x) + K{x,A)), xe-D, AeD, 

the lazy version of K. 

If TT is a stationary distribution of K,Jhen tt is also a stationary distribution of K. If K 
is reversible with respect to tt, then K is also reversible with respect to tt. For a Markov 
chain with transition kernel K and initiaj distribution v we may define a lazy Markov chain, 
a Markov chain with transition kernel K and initial distribution v. 



The set function ^i: D -> M is a real-valued signed measure if /^(0) = and for pairwise disjoint Ai, A2, . . . , 
with Afc G D for fc e N, one has ^l{y^^^^A^,) = E^i M(-4fc). 
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Assume that tt is a stationary distribution of a transition kernel K and let / : Z) ^ M be an 
integrable function with respect to tt. Let us define 

P"'f{x)= f f{y)K"'{x,dy), xeD,meN. 

JD 

We call P Markov operator or transition operator If a Markov chain {Xn)neN with transi- 
tion kernel K and initial distribution 6x, the point mass at x e D, is given, then P'"^f{x) is 
the expectation of f{X.m+i). 

Let us state some well known properties of the operator P acting on functions and on 
signed measures. 

Lemma 3.6. Let n be a stationary distribution of transition kernel K and let f: D ^ R be 
an integrable function with respect to n. Then one obtains forv e M{D) that 

( f{x){vPmdx)= ( {P'^f){x)v{<lx), meN, (3.2) 

J D J D 

whenever one of the integrals exist. In particular 

S{f) = I fix) n{dx) = / 7r(da;), m G N. (3.3) 

JD JD 

Proof Equation (13.31 is an immediate consequence of 1 13.21 and stationarity. Hence one 
has to prove 1 13.21 . The equality holds for indicator functions and for simple functions. 
Then by the standard procedure of integration theory the equality can be extended to 
positive and afterwards to general integrable functions. □ 

Note that if a Markov chain {Xn)neN with transition kernel K and initial distribution u is 
given, then | |3.2| can be rewritten as 

The following result is well known, see for example |LS93I equation (1 .2), p. 365]. 

Lemma 3.7. Let the transition kernel K be reversible with respect to tt and letF: DxD ^ 
R. Then 

f f F{x,y)K"'{x,dy)n{dx)^ f f F{y,x)K"\x,dy)7r{dx), m G N, (3.4) 

JDJD JDJD 

whenever one of the integrals exist. 

Proof The reversibility of the transition kernel K implies reversibility of the m step tran- 
sition kernel K"\ Hence it is sufficient to show the assertion for m = 1. By using the 
reversibility one has 

/ / lAxBix,y)K{x,dy)Tr{dx) = / lAxB{y,x) K{x,dy) Tr{dx), A,BeD. 

JDJD JDJD 

The equality of the integrals can be extended to arbitrary sets C g 2) (g) D, where D (X) 23 
is the product o-algebra of D with itself. This is an application of the Dynkin's Theorem. 
Then it is straightforward to consider the cases where F is a simple function, a positive 
function and finally an integrable one. □ 
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For p e [1, oo) let us define 

Lp = Lp{D,TT) = |/: ^ R I ||/||P = \f{x)f 7r(dx) < ooj . 

Forp = CX3 the essential-supremum norm with respect to n is defined by 

ll/lloo = esssup|/(y)|== inf sup |/(y)| , 

such that 

Sometimes it is convenient to consider bounded functions on D, not 7r-a.e. bounded 
ones, thus we define 



Lb = Lb{D) = /: ^ M I I/I = sup \f{x)\ < 



oo 



xeD 



The next result is standard, see for example IBR95I Lemma 1 , p. 334]. 

Lemma 3.8. Letp e [1, oo]. For any transition kernel K witli a stationary distribution n it 
follows that 

||P/||,< 11/11, and ||P||^^^^^ = 1. 
Proof. If p < oo, then by Jensen inequality (J) and | |3.3> one obtains 



f^\Pfix)\P TTidx) < J^(^Jjfiy)\K{x,dy)y nidx) 



< / / \fiyrKix,dy)nidx)= [/(a;)^ 7r(d:r). 
(J) Jd Jd iMl Jd 

Since TT is a stationary distribution of the transition kernel one has for e S that 

n{N) = ^ K{-,N) = TT-a.e. 

Null sets with respect to n are the same as null sets with respect to K{x, •) for almost all 
X e D. Hence 

\Pf{x)\ < [ \fiy)\K{x,dy)<\\f\\^ n-a.e. 

and we have < ||/||p forp e [l,oo]. Let u{x) = 1 for all x e D. Then Pu = u with 

ll^llp = 1 and we obtain WPh^^^, = 1- ° 

The closed subspace 

Ll = {feLp\ S{f) - 0} 
of Lp is important. Note that L2 and are Hilbert spaces with scalar product 



(/,.g)= / fix)gix)7Tidx). 
Jd 

Then 

L2 = L^®(i^)^, where (L^)^ = {/ e L2 | / = c, c e M} 
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On the Hilbert spaces L2 and there exists the adjoint operator P* such that 

(F/,5) = (/,P*g>. 

Furthermore 

\\P\\li^li^\\P*\\l%^li ancl \\P ~ S\\^^^^^^\\P* - S\\^^^^^. 

The following facts about adjoint operators are helpful. Let T -. Lp ^ Lp, with p e [1,00), 
be a linear bounded operator. Then the adjoint operator T* -. Lq ^ Lq, with q e (1, 00], is 
defined as follows. Suppose that p and q are chosen such that p^^ + q^^ = 1. It is well 
known that Lq is isometrically isomorphic to the dual space {Lp)', where the isomorphism 
is given by 

A-.Lq^iLpY, A{g){f) = {f,g), f e Lp. 

Then there exists the dual operator -. (Lp)' (Lp)' and the adjoint operator acting on 
Lq can be defined as T* = A^'^T^A. Fiqure lSTI illustrates the construction by a diagram. 

Lq > Lq 

A 

{LpY {Lp)' 

Figure 3.1.: Illustration of the definition of the adjoint operator T* Lq^ Lq of 

T: Lp^Lp. 

Furthermore, for all f e Lp and for all g e Lq one has 

{f,T*g) = {f,A-'T-Ag) - A{A-'T- Ag){f) 

^{T-A){g){f) = A{g){Tf) = {Tf,g). 
(dual operator) 

Then 

II J II (Lp)' — 

= snp \\A-'T-Ag\\ =\\T*\\,^^,^. 

Il9ll,<l 

If T = P - 5, then it follows that 

II^^-^IIw, = II^*-^IIl,^l,- 

Let V e M{D). If there exists a density of v with respect to tt then we denote it by ^ and 

for q e [1, 00] let 

1 1 1 I 



\ air \ \q ^ ' 



otherwise. 



Set 



Mq=Mq{D,7T) = {ueM{D) \ \\iy\\g < CX) | 
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The function space is isometrically isomorplnic to tine space of signed measures Mq, 
in symbols Lq = Mq. The space of singed measures M2 is a Hilbert space and tine 
scalar product is the L2 scalar product of the densities, one has 

dn ' dn 



Furthermore set 
Then 



M"^ = {ly e Mq \ i^iD) ^ 0} . 



M2 = M°2®iM°2)^, where (M^)-^ = {j/ e M2 k = c • tt, c e M} . 



Clearly, M2 is also a Hilbert space. We have L§ = and (L^ 



Let us 



recall that the transition kernel applies to signed measures ly e Mq as 

i^p{A)= [ K{x,A)iy{dx), AeT). 
Jd 

Lemma 3.9. Let K be a transition kernel and let n be a stationary distribution ofK. 

(i) Letq e (l,oo] andf e Mq. Then 



and 



\P\ 



« - \\P\\m°^m° ■ 



(ii) Reversibility with respect to n is equivalent to P being self-adjoint acting on L2 and 
M2, i.e. 

{Pf,g) = {f,Pg) and {vP,ii) = {v,^iP). 

Proof. First, let us prove assertion H). For all f e L,, with pchosen such thatp^^+g^^ = 1 
one has 



d{iyP) 



Hence we have 7r-a.e 



d{iyP) 



By using the previous equation one obtains 

difiP) 



|||^||^ = l,p(D)=0 
= \\P*\\rO .ro = II 



c?7r 



= sup 

2 ||l^||,=i.s(^)=o 



Let us turn to assertion ©. It is clear that self-adjointness implies reversibility. The other 
direction follows by 

{Pf,g)^ f I fiy)g{x)Kix,dy)Adx) = f f fix)g{y) Kix,dy)Ad^) = if^Pg) ■ 
Jd Jd iMl Jd Jd 

The result with respect to M2 is shown by using and the self-adjointness of P on 
L2. □ 
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In the following we introduce several convergence properties of a Markov chain (X„)„gN 
with transition kernel K and initial distribution v. We assume that tt is a stationary distri- 
bution of K. The goal is to quantify the speed of convergence of vP'"^ to tt for increasing 
m e N. For further details let us refer to lRR97a] , [RR04I or [Che05]. 

Definition 3.10 (L2-spectral gap). Let P be the Markov operator with corresponding tran- 
sition kernel K. Then there exists an (absolute) L2-spectral gap, if 

t3-\\P\\L0^L0<h 

where the L2-spectral gap is given by 1 - ^. 

Let us briefly explain what this means for reversible transition kernel. If the transition 
kernel K is reversible with respect to tt, then let spGc(P|i2) be the spectrum of the self- 
adjoint operator P acting on L2 and spec(F|L^) be the spectrum of P acting on ij- Since 
\\P\\l2^L2 - 1 spectrum spec(F|L2) is contained in [-1, 1]. Let us define 

A = inf {a I a e spec(P|L2)} and A = sup {a | a G spec(P|L2)} • 
Since P is self-adjoint, it is well known that 

A= inf (Pg,g) and A= sup 

|ls|!2 = l,seL0 \\g\\2 = l,geL° 

Then we have 

spec(P|L!]) c [A, A] and /3 = II^'IIlo^lo = niax{A, |A|}. 

The existence of an i2-spectral gap implies that -1 < A < A < 1, consequently there is 
a gap between 1 e spec (P|i2) and 13, the second largest absolute value of spcc(P|i2)- 

Definition 3.11 (L2-geometric ergodicity). A transition kernel K with stationary distribu- 
tion TT is called L2-geometrically ergodic, if for all probability measures u e M2 there 
exists an a G [0, 1) and < 00 such that 

II^^P" -7r||2 < C^a", nGN. 

An L2-spectral gap implies L2-geometric ergodicity. 

Proposition 3.12. Let K be a transition kernel witli stationary distribution n. Assume 
tliat tlie Markov operator P lias an L2-spectral gap, i.e. 1 - /3 > 0. Tlien tlie transition 
kernel K is L2-geometrically ergodic. 

Proof. If 1/ e M2 and i^{D) = 1, then one obtains (ly - tt){D) = and the proof is 
completed by 

\WP- 7r)P"||2 < ||P||X,o^^o 11^^ - ttII^ = |k - ttH^ . □ 

If the transition kernel is reversible with respect to tt, then L2-geometric ergodicity and 
the existence of an L2-spectral gap are equivalent. This result is shown in [RR97a]. 

Proposition 3.13. Let the transition kernel K be reversible with respect to tt. Then the 
following statements are equivalent: 

(i) The transition kernel is L2-geometrically ergodic. 
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(ii) The Markov operator P has an L2 -spectral gap. 
Proof. See | RR97al Theorem 2.1 , p. 17]. 



□ 



For further details and even more equivalences of i2-geometric ergodicity, see IRR97al 
[RT01]. The next definition is similar to Lp-exponential convergence in [Che05]. 

Definition 3.14 (Lp-exponential convergence). Let p e [l,oo], let a e [0,1) and M < 
00. Then the transition kernel K with stationary distribution n is called Lp-exponentially 
convergent with {a, M) if 



|P" - S\\ 



The transition kernel is called Lp-exponentially convergent if there exist an M < cx3 and 
an a e [0, 1) such that it is Lp-exponentially convergent with (a, M). 

The Markov chain is called i2-geometrically ergodic or Lp-exponentially convergent if the 
corresponding transition kernel K is L2-geometrically ergodic or Lp-exponentially conver- 
gent. 

Let p and q be chosen such that + = 1. The condition of Lp-exponential conver- 
gence implies convergence of i/P" to the stationary distribution tt for increasing n e N in 

M,. 

Corollary 3.15. Letp e [1,00) and v e Mq with p^^ + q^^ = 1. Let the transition kernel 
K with stationary distribution it be Lp-exponentially convergent with {a, M). Then 



\WP^' -Aq<M\\u~7:\\^a- 
Proof. The assertion is proven by 

d((i/-7r)P") 



l(^-'r)P"ll, 
((P")* - S) 

<\\P''-S\\r^ 



dn 



n e N. 



dv 
dn 



fdv 



< \\iP''-sy 



- 1 

dv 
dn 



dv 

dn 



1 



< MWv-nW^a''. 



□ 



In the following we consider relations between the existence of an L2-spectral gap and 
Lp-exponential convergence. First, let us add some helpful inequalities. 

Lemma 3.16. Let n be a stationary distribution of transition kernel K. Then 



|P"I 



\P'^-S\\ 



< n e N. 



If p e [1,00] then 



Proof Note that if P is a normal operator, i.e. PP* 
ne has p 

|P"-S'|| 



neN. 
P*P, then ||P"||^o 



(3.5) 
(3.6) 



otherwise one has ||P"||ip^iO < IIPHlp^lp = z^"- ^y 



= sup ||(P"-5)/||2 

Il/ll2<l 

< sup ||P"5ll2 

Ilffll2<l. S{g)=0 



sup ||P"(/-^(/))|l2 

/Il2<l 



IP" I 
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and 

||P"|l^o^^o= sup |lP"5||p= sup \\P^g-S{g)l 

" " Il9llp<l, S(g)=0 llffllp<l, •S(g)=0 

< sup ||(P"-5)/||^ = ||P"-5|l^^_,^^ 

llfll <1 

I J Hp- 
claim ll3.5t and the first part of Il3.6l l are sinown. Furtlnermore one obtains 



\P"-Sh,^L,= sup ||P"/-5/llp = 2 sup 

llfll <i llfll <i 



P'\\{f~Sf)) 



<2 sup ||P",g||, = 2||P"|l^o^io, 

\\9K<hS{g)=0 " " 

which finislnes tine proof. □ 

In a general setting it follows that an P2-spectral gap implies Pp-exponential convergence 

for all p e (1, oo). 

Proposition 3.17. Let p e (l,oo). Let tt be a stationary distribution of transition kernel 
K and n e N. Tlie existence of an L2-spectral gap, 1 - /3 > 0, implies Lp-exponential 
convergence. We obtain 

IIP" S\\ < [2^/^/3'""^, ^^£(1,2), 

Proof Letpe (1,2). Lemma lSTSl aives 

I1P"-5||^^^^^ </?" and ||P"-5||^^^^^ <2. 

We apply Proposition IA.4I (Interpolation Theorem of Riesz-Thorin), where T ^ P^^ - S 
and (7i = 2, = 1 such that 9 = The case where p e {2,oo) follows by the same 
interpolation argument, since by Lemma ISTSj one has 

||P"-^|li,^i, </3" and IIP" - ^11^^^^^ < 2. □ 

From Proposition 13. 17! and actually already from Il3.5t it follows that an i2-spectral gap 
implies P2-exponential convergence. With the additional assumption of normality of P 
one can prove the reverse direction. 

Proposition 3.18. Let ir bea stationary distribution of transition kernel K. Let the Markov 
operator P be normal, i.e. PP* = P*P. Then the following statements are equivalent: 

(i) There exists an L^-spectral gap, i.e. 1 - (3 > 0. 

(ii) There exist an a e [0, 1) and M < oo such that the transition kernel K is L2- 
exponentially convergent with {a, M). 

In particular % implies 

/3=||P-5|L,^i,<a, 

so that 

^ = min {a I 3 Ai" < cx) with ||P" - 511^,^^^ < Afa", n G N} . 
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Proof. By | |3.5t of Lemma [3T6] one has that Q implies lE) with {a,M) = {(3, 1). Now we 
show that (E) implies (O. One has 

where PP* is self-adjoint and (P*)" = (F")* for all n e N. Then 

IIP" - 5iii^_^^ = iiP"iiio^io = iiP"(p")iLo^io 

^||p„(p*y„|| ^ ||(PP*)"|IlO^LO 

II V ; "■f-2->--f-2 (normality) ii^2^-L2 



such that 



\P--S\\r^r<Ma^ ^ |l(PP*)"|l^o^^o <MV". (3.8) 



By the spectral radius formula and the self-adjointness (s-a) of PP* one obtains 

ll^llie-LO = II^^Ilo^lo -^r[PP*] 



2 



= lim (||(PP*)"ILo^io)i/" < a' lim (Af2)V" < a 

n— >oo 2 2 |3 g| n— >-C30 

Hence the proof is completed. □ 

By an interpolation argument we get that ioo-exponential convergence or ii-exponential 
convergence imply an L2-spectral gap if the Markov operator is normal. 

Proposition 3.19. Let n be a stationary distribution of transition kernel K. Let K be Li- 
exponentially convergent or Loo-exponentially convergent witfi (a, M). Suppose tfiat ttie 
Markov operator P is normal, i.e. PP* = P*P. Then there exists an L2-spectral gap, in 
particular one obtains 

/3 = IIP- ^11 L.^L. <V^- (3-9) 

Proof. We show that ii-exponential convergence with {a,M) implies /3 < For Loo- 
exponentially convergent Markov chains the claim follows by the same arguments, where 
the roles of L^o and Li are interchanged. 
By the assumptions of the proposition and Lemma [3T6] one has 

||P"-^|li,^i, <a"M, and ||P"-5||^^^^^ <2. 



By Proposition IA.4I (Interpolation Theorem of Riesz-Thorin), where T = P" - S* and 
gi = 1, 92 = oo, 6* = i one obtains L2-exponential convergence with (^/a, 2^/2^//^/^). 
Then Proposition [3T8]implies /3 < ./a and the proof is completed. □ 

Another way to measure the convergence of i^P" to tt for increasing n e N is provided by 
using the total variation distance, defined as follows. 

Definition 3.20 (total variation distance). The total variation distance between two prob- 
ability measures i^, e M{D) is defined by 

lk-Mlltv= sup \v{A) ~ ^i{A)\ . 
The total variation distance can be considered as an Li-norm. 
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Lemma 3.21 . Let v,ii e M{D) be probability measures. Tlien 



Ik-^lltv = 7; sup 
^ I/I<1 



D 



(3.10) 



wliere \f \ = sup^^^ \f{x)\. Ifv.^ie Mi, tlien \\v - = 5 Ik - mIIi • 

Proof. See tRR04| Proposition 3, p. 28]. □ 

Now we can define uniform ergodicity of a transition kernel K. 

Definition 3.22 (uniform ergodicity, vr-a.e. uniform ergodicity). Let M < 00 and a e [0, 1). 
TInen tine transition kernel K with stationary distribution n is called uniformly ergodic witfi 
(a, M) if one has for a\\ x e D that 

<Ma", n e N. (3.11) 

If the inequality of (13.1 It holds 7r-a.e, rather than for all x e D, then the transition kernel 
K is called n-a.e uniformly ergodic with {a, M). A Markov chain with transition kernel K 
is called uniformly ergodic or ir-a.e uniformly ergodic if there exist an M < cx) and an 
a e [0, 1) such that K is uniformly ergodic or 7r-a.e uniformly ergodic with (a, M). 

Obviously, if the transition kernel is uniformly ergodic then it is also vr-a.e. uniformly er- 
godic. Note that in other references, e.g. ICheOSl . uniform ergodicity is called strong 
ergodicity. 

Uniform ergodicity is closely related to Loo-exponential convergence. An important rela- 
tion is presented in the following proposition. Recall that Lb = Lb{D) denotes the class 
of bounded functions on D. 

Proposition 3.23. Let a e [0, 1) and M < 00. Let n be a stationary distribution of 
transition kernel K. Then the following statements are equivalent: 

(i'j The transition kernel K is uniformly ergodic with (a, M). 

(ii') The transition operator P satisfies 

Furthermore H) and 0'j imply the following equivalent statements: 

(i) The transition kernel K is ir-a.e. uniformly ergodic with (a, M). 

(ii) The transition kernel K is Loo-exponentially convergent with {a, 2M). 

Proof. By Lemma |3]2T| the equivalence of 0) and P) holds true. The equivalence of 
and (HU remains to prove. First, let us show that 7r-a.e. 

sup |P"/(a;) - S{f)\ = sup \P^f{x) ~ S{f)\ . 

Il/lloo<l I/I<1 

Note that 

7r(iV)=0 ^ K'^{-,N)^0 TT-a.e. 
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for all N e D and n e N, since tt is tine stationary distribution. Suppose tinat / e Loo- 
Obviously, if e D and 7r(iV) = tinen 7r-a.e. 

\P-fix) - - |P"(lAre/)(:E) - . 

Let 11/11^ < 1, i.e. 7r({a; G B : f{x) > 1}) = 0. Define 



fix), f{x)<l, 
1, f{^) > 1, 



such that f{x) = g{x) holds 7r-a.e. and \g\ < 1. Thus, 7r-a.e. 

\PVix) - Sif)\ = |P"5(x) - S{g)\ < sup |P"5(x) - , 

|9|<1 

so that TT-a.e. 

sup - S{f)\ < sup |P"g(a:) - 5(5)! . 

ll/IL<i l9l<l 

The inequality in the other direction is clearly also correct, i.e. 7r-a.e. 

sup |P"/(x) - S{f)\ = sup |P"<?(x) - 5(5)1 . 
Il/ll^<i lsl<i 

By applying the essential-supremum on both sides of the previous equation and (13.1 Ot 
one obtains 

11^" - Sh^^L^ = 2 esssup \\K"{x, •) - 7r||,, . 
xeD 

Hence the proof is completed. □ 

It is known that there are transition kernels where the Markov operators have an L2- 
spectral gap and the transition kernels are not uniformly ergodic, see [MT96J. Further- 
more, uniform ergodicity implies an L2-spectral gap, see |R R97al . In this sense uniform 
ergodicity is a stronger property than the existence of an L2-spectral gap. 

Proposition 3.24. Let a e [0, 1) and M < 00. Let the transition l<ernel K be reversible 
witli respect to n. Tlien tlie following statements are equivalent: 

(i) The transition kernel K is Li -exponentially convergent with [a, 2M). 

(ii) The transition kernel K is Loo-exponentially convergent with {a, 2M). 
(ill) The transition kernel K is n-a.e. uniformly ergodic with {a, M). 

Each of the conditions imply that the Markov operator has an Lz-spectral gap. We have 

Proof First we prove the equivalence of and lE). By reversibility one can see for / e Li 
and /i e Loo that 

{iP-~S)f,h) ^{f,iP--S)h). 
The adjoint operator of P" - 5 acting on Li is P" - S acting on Loo- Then, one has 
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and the equivalence is obvious. 

By Proposition 13. 23l one lias tinat lE) is equivalent to 

The last implication follows by an interpolation argument. Proposition IA.4I (Interpolation 
Theorem of Riesz-Thorin) with qi = oo, q2 = 1 and = 1/2 is applied. Then, 



(3.12) 



Because of the self-adjointness (s-a) of P one can apply the spectral radius formula and 
one obtains 



P = \\P\\l«^l% =. r[P] = lim (||P"ILo^io)i/" < a ■ lim (4M) 



□ 



I3l2l 



In Figure |3^ we present a survey of the discussed relations between the terms of con- 
vergence and ergodicity. 



uniform 
ergodicity 



TT-a.e. uniform 
ergodicity 



L2-geometric 
ergodicity 



L2-spectral 

gap 



Loo-exp. 
convergence 



L2-exp. 
convergence 



Li-exp. 
convergence 



Figure 3.2.: Ergodicity terms and their relations are illustrated. A solid line represents 
the implication without any assumption of reversibility. A dashed line represents the 
implication under the assumption of reversibility. 



3.2. Error bounds 

in this section we prove error bounds on general state spaces. We assume that we have 
a Markov chain (X„)„gN with transition kernel K and initial distribution v, where tt is a 
stationary distribution, and compute 

1 " 

Sn.noif) = - fjXj+nq) 

as approximation for S{f) = f{x)TT{dx). The error is measured in the mean square 
sense, i.e. 

Now let us present a helpful result. 
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Lemma 3.25. Let {Xn)neN be a Markov chain with transition kernel K and initial distribu- 
tion V. Then forijef^ with j < i it follows that 

(3.13) 
(3.14) 



Jd 

Moreover, if n is a stationary distribution and v ^ it then 
Proof The calculation 



JD JD 

f{xj)P'-^ f{xj) K{xj-i,dxj) . . . K{xi,dx2) i^idxi) 



pj[fp--ij){x)v{dx) 



proves (I3.13> and by (13. 3> one can see | |3.14> . 




□ 



First we assume that the initial distribution of the Markov chain is a stationary one. Hence 
it is not necessary to do any burn-in, i.e. no = 0. The resulting method is denoted 
by Sn instead of Snfl- Afterwards we turn to the general method S'„^„o where the initial 
distribution might differ from a stationary one. 

In the next statement we assume that the transition kernel is reversible with respect to it. 
Then we can apply the Spectral Theorem for linear, bounded and self-adjoint operators, 
see Theorem lA.2l 

Proposition 3.26. Let f e L2 and g = f - S{f). Let (X„)„gN be a Markov chain with 
transition kernel K and initial distribution it, let K be reversible with respect to tt and let 

A = inf {a I a e spcc(P|L2)} , A = sup {a | a e spec(P|L2)} . 

Suppose that A < 1. Then 



W{n, a) d {E{^}g, ff) = — {W{n, P)g, g) , 



(3.15) 



where E denotes the spectral measur^ which corresponds to P: L2 ^ and recall 
that 

n{l - a^) - 2a(l - a") 



Win, a)- 

(1 — a)"^ 

Proof. Since / e L2 we have g e L^. The error obeys 



ae [-1,1). 



1 " 



K 



n— 1 n 



^ i ^ I L J. f t 

-^E.,K[g(x,)'] + -E E ^^,K[9iXj)g{x.) 



i=i 



^The definition of a spe ctral measure and the Spectral Tfieorem for linear, bounded self-adjoint operators are 
stated in Section IXTI 



48 



For i, j e N with j <i\Ne obtain 

where the last equality is an application of Theorem |A]2l Altogether this gives 



n + 2 



n— 1 n 
3=1 i=j+l 



n+l 



(l-a)2 

= J- f w{n, a) d {E^^yg, g) = ^ {Win, P)g, g) 
n Jx n 



□ 



By the Spectral Theorem we have a representation of the error depending on the Markov 
operator P. In this setting one can show a relation between the operator norm of 
W{n,P): and the maximal error of Sn for integrands / which satisfy II/II2 < 1. 

This is stated in the next corollary. 

Corollary 3.27. Let {Xn)neN be a Markov chain with transition K and initial distribution 
TT, letK be reversible with respect to n and suppose that A < 1. Then 



sup eASnjf = \\\W{n,P)\\^o.L. 

Il/ll2<l 



1 + A 



2A(1 - A") 



< 



A) n2(l - A)2 " A)' 



Proof. The last inequality of the assertion follows by Lemma I2T2] The mapping a ^ 
W{n,a) of Proposition 13.261 is increasing, see also Lemma I2T2] For 5 = / - S{f) we 
have 



1 



W{n, a) d g) < —W{n, A) / d g) 



-^W{n,A) {g,g)^ 



1 + A 2A(1 - A'^ 



n(l-A) n2(l-A)2 



The assertion is proven by 



W{n,A)^ max |W^(n,«)| = ||W^(n,P)|l^o^^o= sup (W(n,P)g,g) 

aespcc(P|L«) 2 2 \\g\\^<l,geL° 

= sup ■ e^{Sn,gY <n^ sup e^(5'„,/)^. 

\\gh<hgeLO ||^j|^<i 



□ 



If the transition kernel K is reversible with respect to n and the Markov operator has an 
L2-spectral gap, then 

/3 = ll^llL0^L0=max{A,|A|}<l. 

Note that Proposition 13.261 holds already if A < 1. Hence an L2-spectral gap is not 
necessary. If the transition kernel K is not reversible but one has an L2-spectral gap, 
then the following error bound can be shown. 
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Proposition 3.28. Let (X„)„gN be a Markov chain with transition l<ernel K and initial 
distribution n. Let n be a stationary distribution of K. Let f e L2 and assume that there 
exists an L2-spectral gap 1 - /3 > 0. Then 

eASnJ)'<^^^^^\\f\\l. (3.16) 

Proof. Lelg = f - S{f). The error obeys 

n ^ n—l n 

j=i j=i i=j+i 

For with j < i we have by the Cauchy-Schwarz inequality (CS) that 



Then, with W{n,f3) from Proposition l3.26l one has 

The estimates of the error under the assumption that the initial distribution is a stationary 
one seem to be restrictive. If we could sample tt directly we would approximate S{f) by 
Monte Carlo with an i.i.d. sample. However, even if it is possible it might happen that 
the direct sampling procedure is computationally expensive, such that it is reasonable to 
generate only the initial state by sampling from tt and afterwards run a Markov chain with 
stationary distribution tt. 

The error of a Markov chain Monte Carlo method with stationary initial distribution is 
related to the error with not necessarily stationary initial distribution. 

Proposition 3.29. Let r e [1,2], let f e L2r and let v e A^r/(r-i) be a probability 
measure. Let {Xn)neN be a Markov chain with transition kernel K and initial distribution 
V and let n be a stationary distribution ofK. Then 

n— 1 n 



eASn,noJf = eASnjf + ^Y.^^+^o{9^^ + ^T. E ^.+"0 (ff ) : (3-17) 
where g ^ f - S{f) and 

L,{h)^l^{P'~S)h,{^-l)^, heLr,teN. 
Proof. The proof is adapted from I RudOQI Lemma 6, p. 17]. One has 

^ n n 
" i=i i=l 

f P"o+.(^2)(^)^(^^)^^^ J2 I P-^^+\gP^-=g){x)v{dx). 

^ 1 J D ^ : 1 7 ■ I -I J D 



n , ,, 
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For h e L,. and e Mr/{r-i) we have for all i e N that ^ • P*/i is integrable with respect 
to TT. Then the following transformation holds true 



{P'h){x)iy{dx) = { P'h 



dv 
diT 



dv 



= {p^h,\) + {P'h,{—-\) 



= {P'h,l) + ({P'-S)h,{^-l)^ 

= j^{p'h){x)^{dx) + {{P'- s)h, - 1; 



Formula | |3.17| | is shown by using the previous calculation for h and h ^ gP^ ^g. □ 

Equation (13.171 is still an exact error formula. The next lemma provides an estimate of 
the functional Lk{-) for fc e N. 

Lemma 3.30. Letr e [1,2], v e Mr/(r-i) and h e Lr. Recall that i3 = II^IIlo^lo. 
(i) Ifr e (1,2], then 

dv 



dn 



- 1 



\h\\ 



keN. 



(3.18) 



(ii) lfr = l and the transition kernel is Li-exponentially convergent with {a, M), then 



\Lkih)\ < Ma^ 



dv ^ 
dn 



\\h\\i, fceN. 



(3.19) 



Proof. After applying Holder's inequality (HI) with conjugate parameter r and s = ^r^j to 
Lkih)^{{P''-S)h,{^-l))onehas 



\Lk{h)\ < \\{P''-S)h\ 

(HI) 



dv 
d-K 



- 1 



< Wp" ~ s\ 



dv 
d-K 



- 1 



\h\\ 



By equation ||3.7| | the claim of is proven and by the Li-exponential convergence the 
inequality of © holds. □ 

Note that if r = 2 t hen o ne has \Lk{h)\ < (i^ ||^ - ijj^ \\h\\^, see |3?5). This is by a factor 
of two better than i l3.18t , but not essentially different. 

In Lemma [330] we have seen that under suitable assumptions one can ensure an expo- 
nential decay of Lk{-) for increasing fc e N. This fact is used to show for reversible Markov 
chains which are Li-exponentially ergodic with (a, M) that there exists a constant C^^a,M' 
which is independent of n and no, such that 

An immediate consequence of the inequality is an explicit error bound. The following 
lemma and remark imply such an inequality and provide Ci,,a,M explicitly. 
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Lemma 3.31. Let (X„)„gN be a Markov chain with transition kernel K and initial distri- 
bution V, where ly e Moo- Let K be reversible with respect to n and Li-exponentially 
convergent with (a, M). Let f e L2 and 

n n — 1 n 

j = l j=l k=j + l 



Then 



|ei/(S'ri,no , /)^ — Ctt ("S'n, /)^ I < 



[/(a, n) 



M 



dv 



(3.20) 



Proof. Let g = f - S{f ). The equation d3.17t implies 

\e.{Sn,noJr~e.{Sn,fr\<-Y.\L,+r.o{9')\+-J2 E \Lj+no{9P'-'9)\ 



n—1 n 



By ll3T9t of Lemma |330] one obtains 



dv ^ 
dir 

dv ^ 
d-K 



3=1 k=j+i 



Ml 



By tine reversibility and ii-exponential convergence of K we get from Proposition 13.241 
that (3 = ||-P|1lo_j.^o < a. Then by applying the Cauchy-Schwarz inequality (CS) one has 

WgP'-'gl < II.9II2 WP'-'gh < II.9II2 Wp'^'Lo^lo < II5II2 • 

Leteo = """MIIx^ - ill .Then 

^' \\ an Woo 



J2\Lj+noig')\+2j2 E \L,+noigP'-'g)\ 
j=i j=i k=j+i 

n n—1 n 

<eo||5ll2E"'+2^o||5ll2E E 

(n n—1 n 

E"^ + 2;^ E 
j=i j=i k=j+i 

= So ■ U{a,n) ■ \\g\\l < Bq • ?7(a,n) • ||/||2. 



Thus the proof is completed. 



□ 



Remark 3.32. The function U{a,n) is already studied in Lemma |2T9l For convenience 
let us repeat the result. For all n e N we have 



U{a,n) < 
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Then, from Lemma l3?3n it follows that 



2M\\^-l\\ a"" 



eASn,noJr < e^iSr^jf + ^^^^ _ \\f\\l 

If the initial distribution is tt then one has the error formula of Proposition 13.261 

Remark 3.33. Note that in Lemma 13311 reversibility of K was essentially used to apply 
Proposition 13.241 If the Markov operator is normal, i.e. PP* = P*P, then one has by 
Proposition [3Jl]that (3 = H-PH^o^^o < V^- By this observation we get a very similar esti- 
mate as in Lemma l33n for normal Markov operators which are not necessarily reversible. 
The only difference to | |3.20| | is that a has to be substituted by ^/a. Then 

U{V^,n) < < 



The last inequality is implied by 1 - a'' > r(l - a) for r e [0, 1] which is a conclusion of 
the Bernoulli inequality with real exponent . 

The next theorem summarizes the main result for a Markov chain with a reversible and 
Li-exponentially convergent transition kernel. 

Theorem 3.34. Let (X„)„gN be a Markov chain with transition kernel K and initial dis- 
tribution V. Let K be reversible with respect to tt and Li-exponentially convergent with 
(a, M). Let f e L2 and assume that the probability measure v e Moo- Then 

2 2M II — — ill a"" 

eASn,noJf < 11/11^ + Itl^ _ 11/11^ (3.21) 

and for g = f - S{f) we have 

lim n-e,(5„,„„,/)2 = lim n • e,(5„, /)2 - ((/ + - P)" V ff) • (3.22) 

Proof. By Lemma |33T| and Lemma [2T9] the first e quality of (l3?22t holds true. By the 
reversibility of the transition kernel Proposition |3.26l applies, so that 

lim n-e,(5„,/)' - lim - {Win, P)g, g) ^ {{I + P)il - P)-' g, g) . 

n—>-oo n— ^00 n 

The rest follows via Lemma [33Tl Corollary [T271 and Lemma l2T9l □ 

Remark 3.35. Under the assumptions of Theorem |3]34] one has by Proposition 13. 24| that 
TT-a.e. uniform ergodicity with (a, A/) is equivalent to Li-exponential convergence with 
(a, 2Af). Hence one can restate Theorem |3.34| for uniformly ergodic Markov chains and 
obtains the same result with M = 2M. This is the general state space counterpart to 
Theorem |2.20[ where M is of the magnitude of ||:^||^ and /3 = a. 
Furthermore note that if the Markov operator is normal and not necessarily reversible, 
then one can get a similar error bound by using Remark l333] 



' The Bernoulli inequality with real exponent r e [0, 1] states for any real number x- > -1 that {1+xY < 1+rx. 
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Remark 3.36. The error bound of | |3.2H might be interpreted as follows: The burn-in tiq 
is reasonable to eliminate the influence of the initial distribution, while n has to decrease 
eniSn,/)- For large n the error behaves exactly as the error where one started by the 
stationary distribution. Hence the bias of the initial distribution disappears after sufficiently 
many steps. If the initial distribution falls together with the stationary one, then the bias of 
the initial part vanishes completely. 

Another consequence of Lemma lS^STI and Lemma l2T9l is the following result concerning 
the asymptotic error for II/II2 < 1. 

Corollary 3.37. Under the same assumptions as in Tlieorem l3.34\ it follows that 

fa ^^2 1 + ^ 
limn- sup e^{Sn,no,J) = "1 T 

n^oo 1-A 

and 

,c ,V2 1 + A 2A(1-A") 
hm sup e^{Sn,noJ) = —r, IT TT^ mT' 



Proof. Let us define 



2a""M - 1 

W aiv Woo 



One hasJim„^oon • c„,n = and lini„(,^oo c„^„„ = 0. For \\f\\^ < 1 we obtain by 
Lemma l33Tl and Lemma l2T9l that 



Hence 



sup ej,{Sn, ff - Cn,no < SUp e^{Sn,no , f)"^ < SUp 6^ (S^ , /)^ + C„,„o . (3.23) 

Il/ll2<l Il/ll2<l Il/ll2<l 

Recall that A = sup {a \ a e spec(P|L2)}- Then by Corollary [3. 27l we have 



sup e^{Sn,fY = 

Il/ll2<l 



2_ 1+A 2A(1-A") 
n(l - A) " n2(l - A)2 ' 



By taking the limits in ll3.23t the assertions are proven. □ 

In many examples it is known that the transition kernel is Xi-exponentially convergent 
or TT-a.e. uniformly ergodic, but it is very difficult to obtain reasonable values of [a. M) 
explicitly. Then at least the asymptotic result can be used. This is similar to results of 
ISok97llBii99]lMat99l . 

Remark 3.38. Observe that we have a lower and an upper bound of the error of S'„,„o. 
Exactly as in Remark |Z241 one obtains by I l3.23t that 



n(l-A) n^{l-Ky \\f\\^<i n(l - A) 
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We showed an error bound of Sn,no with respect to II II2 for Markov chains which are 
reversible and ii-exponentially convergent. The condition of the ii-exponential conver- 
gence is rather restrictive. This motivates the study of Markov chains which satisfy a 
weaker convergence property, namely we assume that there is an i2-spectral gap, i.e. 
1 - ^ > 0. This is enough to obtain error bounds for integrands f e Lp with p e (2,00]. 
The following lemmas lead to the fact that there exists a constant Cuj3,p, independent of 
no and n, such that 

|e^(S'„.„o,/)^ - e^(S'„,/)^| < C^^p^p 11/11 — . 

Note that it is not assumed that the Markov chain is reversible with respect to n. 

Lemma 3.39. Let (X„)„gN be a Markov chain with transition kernel K and initial distri- 
bution V. Let n be a stationary distribution of K. Let f e Lp, let v g A^,„ax{2 ^'^^ 
p e (2,00] and 

V{P,n,p)=A 



f j:;^^ + 2^ P'^"^ EL^+i P''^^, P G (2, 4), 

pe[4,oo]. 



\ 2 /3^- + 2^ p'^^' 

(i) Ifpe (2,4), then 



|ei/('S'n,no I /) ^n{Sn, f ) I < „ /3 " " 



dv ^ 
dn 



ll/ll^ 



(ii) Ifpe [4,00], then 



V{(3,n,p) 



13" 



dv 
dn 



1 



ml- 



Proof. First, let 5 / - S{f) and observe that for p > 1 one obtains 

blip < 11/11, + |5(/)|< 11/11^ + |l/|li< 2 11/11^. 
The equation of (13.1 7t implies 

n— 1 n 



(3.24) 



.{Sn,n,jr - e.(5„,/)2| < |L,+„„(g2)| + - J] ^ |L,+„„ (gP'^-^5) | 



(3.25) 



Let p e (2, 4). Then it follows by l ITTSt with r = f and r/{r - 1) 



p-2 



that 



|L,+„„(5^)|<24/''/3^^-^/?2"o-^ 



dn 



- 1 



II.9IL, 



lp/2 ■ 



By applying the Cauchy-Schwarz inequality (CS) and (13. 7t one obtains 



'5I/2 Il.9llp F'^^^llp < II.9II; \\P"'' Wlo^lo r—p'— Mp ■ 



(CS) 



(371 
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Let sq{p) = 13^ 



— - 1 Then 

p-2 



j+no 

3=1 i=i fc=i+i 



n-l 



n n— 1 

■■ p-2 3p + 2 .r-^ p-3 



<^F(/3,n,p)-£ob)||/||p. 



Thus, claim iH) is shown. 

Let us turn to lE), i.e. p e [4,oo]. Equation 113.1 8t with r = 2 is used to get 



|L,+„„(52)| <2/3^+"o 
|i,+„„(5P'=-^ff)| <2/3^"+"« 



d-K 

dv ^ 
d-K 



Ml, 



By Holder's inequality (HI) with conjugate parameters | and ^ one obtains 



^ (HI) "-^2p/(p-2)^-^2p/(p-2) ^ P-2 

l|2 



"■^2p/(p-2)^-^2p/(p-2) (57) ^ 

Note that in the third inequality of the last estimation it was essential that p e [4, oo] for 
using llffll^ < llffllp. Thus, foreo = ||^ - l||2 one has 



<eo||g||p 2^/3^+2^^/?2J> ^ <V{P,n,p)-eo\\f\\;. 



Finally by substituting this in equation | |3.25l > everything is shown. 



□ 



Let us consider V(l3,n,p). If p e (2,oo] and 1 - ^ > 0, then we show that the mapping 
n 1-^ l/(/3, n,p) is bounded. 



Lemma 3.40. Letp e (2, oo] and 1 - /3 > 0. For all n e N uve obtain 

6Ap 



Vil3,n,p)< 



(p-2)(l-/3)2 



(3.26) 
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Proof. The inequalities indicated by {*) follow from 1 - /?'' > r(l - (3) for r e [0, 1]. First, 
let p e (2, 4). By the geometric series one can estimate 

n n— 1 n—j — 1 

j=i i=i fe=o 

j=i i=i 

^7927; JE/^'" ^T^fl^E/^''' 

^ 16 ^ ^ 16p 



< 



p6(^,4) (1 - - ^2(p-2)/p) _ 2)(1 - 13)2 (p - 2)(1 - ' 

Foryi G [4,00], again by the geometric series, we can estimate 

n n— 1 n—j — 1 

E3p+2 p-2 X — ^ X — ^ J p — 2 

j=i i=i fe-o 



/ 3p+2 p-2 \ „ / p-2 3p + 2 \ 



^3 



pe[4,oo] 1-^^ ~ (l-/3)(l-/3'^) W (j3-2)(l-/3)2' 

This completes the proof. □ 

The main error bound of Sn,no for Markov chains with an L2-spectral gap is presented in 
the next theorem. 

Theorem 3.41. Let (Xn)nen be a Markov chain with transition l<ernel K and initial dis- 
tribution V. Let IT be a stationary distribution of K. For p e (2, 00] let f e Lp and v g 
■^max{2 Suppose that the Markov operator has an L2-spectral gap, i.e. 1 - /3 > 0. 
Then we have 



e(S n^<e(S ^^Pll/llp ^'"""^llf -1|U^ ^^^(2,4), 

n2(p-2)(l-/3)2 |^n„||rf^_l||^^ pe[4,oo], 

where 



I / j 1 p , ifK is reversible with respect to n, 



,T^\\f\\p' Otherwise. 
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Furthermore 



lim n ■ e^{Sn,nQ-, fY = lim n ■ 



(3.27) 



n- 



and if K is reversible with respect to tt then (I3.27t is equal to 



{{I + P){I-P)-^g,g), Where g = f - S{f). 



Proof. By Lemma l339l and Lemma |340] the equality of ll3.27t is true. If the transition 
kernel is reversible, then by Proposition I3.26l the asymptotic result holds since 

linin.e,(5„,/)2 = lim - {W{n,P)g,g) ^ {{I + P){I - P)-^g,g) . 

By Lemma [339] and Lemma 13^40] one obtains the estimate of e^{Sn,na, fY ■ The es- 
timate of e^{Sn,fY follows by Proposition 13.281 and for reversible transition kernel by 
Corollary [T^Tl □ 

Remark 3.42. A large burn-in no guarantees that the influence of the initial distribution 
disappears and a large n makes e^(S'„,/) small. The condition of the Li-exponential 
convergence could be substituted by the existence of an L2-spectral gap by paying the 
price of considering error bounds in terms of Lp-norms of the integrand for p e (2, oo]. If 
p converges to 2, then the bound goes to infinity. However, for p > 2 one has an explicit 
error bound. If the initial and stationary distribution is the same, then the influence of the 
initial part vanishes for all p e (2, oo]. 

Remark 3.43. Let 



Observe that this implies a lower error bound for Sn^no- We do not use it because of the 

lack of a general lower bound of sup|| j-ii e7r(>5'n, /)^ for p G (2,oo]. 

\\j Hp- 
Remark 3.44. Let K be a transition kernel which is reversible with respect to tt. We use 
the notation = i3 and A^- = A to indicate the transition kernel. The lazy version of K 
is given by K. Then one has 



If one has an estimate of A^, then one also has an estimate of /3j^ and one can apply 
Theorem [3.41 1 There are some techniques, e.g. canonical paths (see [YueOO]) and the 
conductance concept (see [LS88, LS93J and IJS89, DS91J ) which are helpful to estimate 
Ak- However, in general it is a challenging task. 



For 11/11 < 1 we have by Lemma [339l and Lemma|33Q|that 




(2,4), 

pe [4,oo]. 



ip)- 
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3.3. Burn-in 



Assume that computational resources for = n + no steps of the Markov chain are 
available. The burn-in no and the sample size n should be chosen such that the error 
bound is as small as possible. One encounters the same trade-off as for finite state 
spaces. In the next statement the error bound for an explicit burn-in is stated. 

Theorem 3.45. 

(i) Suppose that we have a Markov chain which is reversible with respect to n and 
Li-exponentially convergent with (a, M). Let 



no 



log(a-i) 



,0 



Then 



sup e^{Sn,nojf < -7Z ^ 

|/||,<i n{l-P) 

2 

< 



2 



n(l — a) n?{l — a)^ 

(ii) Suppose that we have a Markov chain with Markov operator P which has an L2- 
spectral gap 1 - /3 > 0. For p e (2, 00] let na{p) be the smallest natural number 
(including zero) which is greater than or equal to 



32p II du 
2 II dTT 



1 



W-') \log(64||||-l||J 



(2,4), 
pe [4,00]. 



Then 



sup e^{Sn^n„ip)jf 



< 



n{l-l3) n2(l-/3)2- 



Proof. Assertion H) follows from Theorem 13.341 and Proposition 13.241 Claim (HI) is an 
application of Theorem 13.41 1 □ 

Note that log(/3-i) = + J2f=2 ^^"Tr" log(/3-i) > 1 - /3. This can be used to 

estimate the suggestion of the burn-in. The suggestions of the burn-in are justified in the 
following. 

For simplicity we assume that a ^ 13. Let us define 



C{P) 



'M\\^ 

W dTT 

32p II diJ_ 



Pe(2,4), 



64 



1 



pe [4,00]. 



We consider numerical experiments under the following conditions. Suppose that 

• the computational resources are either N = 10^ or N = 10^. 

• /3 0.9 or /3 = 0.99 or 13 = 0.999. 

• C = C{p) 10-'^°, independent of p. 
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Then the suggestion of the burn-in of Theorem |3.45| for v = 2 and p e [4, oo] has the form 



{2}u[4,oo) 



log(C) 



log(/3-i) 

whereas for p e (2, 4) it still depends on p, such that 



,(2,4) _ 



log(C) 



2(p-2) log(/3-i) 
The error for ||/||p < 1 where p e {2} u [4, oo) is bounded by 



eSt{2}u[4,oo)(",'^o) = 



2C/3'^ 



^ n{l-l3) n2(l-/3)2 
whereas for p e (2, 4) we have the upper estimate 



eSt(2,4)(", ?^o) = 



+ 



2C/3 



n(l-/3) n2(l-/3)2 



With the restriction N = n + no one can numerically compute a burn-in, which approxi- 
mates the minimal upper error bound. This is a 1-dimensional minimization problem with 
different parameters. Let us denote the numerically computed values of the burn-in by 

^{2}u[4,oo) p {2} u [4, (X)) and n^pi^^ for p e (2, 4) respectively. 



'opt 



iV 


13 


{2}U[4,oo) 
''opt 

(by Maple) 


{2}U[4,«)) _ r log(C) ] 
- 1 log(/3-l) 1 

(suggested above) 


(2,4) 
'^opt 

(by Maple) 


lO'' 


0.9 


656 


656 


6655 


10" 


0.9 


656 


656 


6655 


10^ 


0.99 


6873 


6874 


69642 


10" 


0.99 


6874 


6874 


69715 


10^ 


0.999 


68977 


69043 


79011 


10" 


0.999 


69041 


69043 


699520 



,(2,4) 



log(C) 



'0 - I 2(p-2) log(/3-l) 

(suggested above, p = 2.1) 



6885 

6885 

72169 

72169 

724952 

724952 



Table 3.1 .: For C = 10'^° and p = 2.1. The numerically computed value nj^pj which 
approximately minimizes the mapping no ^ estint(iV - no, no), either Int = {2} u [4, oo) 

or Int = (2,4). 

Table HJ] gives a collection of nlp}^^'^'°°^ and n^pj'*-' where p = 2.1. The suggested no of 
Theorem 13.451 is close to the numerically computed values of the burn-in, which approx- 
imately minimize the error bound. For N = 10^ and /3 = 0.999 the difference between 
n^pi^' and ng^'"*-* is large. In this situation Theorem [3.41! gives for no choice of n and no 
with N = 10^ an error smaller than 1. The available resources = n + no are too small, 
such that the suggested burn-in cannot be reached. If the computational resources are 
large enough, then, the computed values n^pj^'"^'^'""' and n^p;'''' are of the same magnitude 
as the suggested 4^^"^^*'°°^ and n'^^'^K 
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If an error of at most e e (0, 1) is desired, tinen tine suggested cinoice or n^,^'^', 

depending on p, of tine burn-in is independent of tine precision e. We cinoose no as 
suggested in Tlieorem r3.45l and 



1 + \/l + 4e2 

n>^, toaclnieve e^(S'„,„o, /) < e. 

(1 - 13)6'^ 



Let tine Markov cinain be reversible with respect to vr and let K = (3. For different fixed 
values no a plot of 



est{2}u[4.oo)(^- '^o,'^o) and sup e^{SN,f)^ 

Il/ll2<l 

is presented in Figure 13.31 Roughly spoken one can see that if the burn-in is chosen 
too small a vertical shifting takes place and if the burn-in is chosen too large a horizontal 
shifting takes place. Summarized one can say, if /?, C and p are given, then choose the 



2A(1- A^) 



iV(l-A) iV2(l-A)2 




Figure 3.3.: For /3 = A = 0.99 and C = 10^" the mapping N ^ est{2}u[4,oo)(A^ - "c^o) is 
plotted for different values of no. The dotted curve is a plot of the mapping 

^^'^sup„j„^<ie^(5Ar,/). 



burn-in as suggested above. If there is an estimate of log(C)/log(/3"^), then one should 
ensure that it is not smaller than the real quotient. As seen in Figure 1331 if it is slightly 
smaller there is already a strong influence. By choosing the burn-in too large the influ- 
ence is less heavy. 
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If there is nothing known about p or C another strategy is to choose n = no = N/2 for 
even N. This has the advantage that no information about /3 or C is needed. In Fiqure l3^ 
we plotted 

est{2}u[4.oo)W2,^/2), est{2}u[4,oo)(^-4'^''"^°°\4'^^"'°°^) and sup e^iS^J) 

Il/ll2<l 

where N e [10^, 10^]. Asymptotically the price of a factor of V2 is paid, i.e. asymptotically 




N = uq + n 



Figure 3.4.: For /3 = A = 0.99 and C = 10^° the mapping N est{2}u[4,oo) - ?^o, ?^o) is 
plotted for different values of no. The dotted curve is the plot of the mapping 

N H> SUP||^||^<ie^(S'Ar,/). 

the error is ^2 times worse than sup||y||^<i e^^iSN, /), see Fiqure l34l This strategy works 
well and reaches the same rate of convergence as in Theorem [3.451 

3.4. Examples 

For the examples in Section 12.4! one can provide all eigenfunctions and eigenvalues. 
Usually it is a challenging task to obtain the necessary information of the spectral struc- 
ture of the Markov operator, in particular on general state spaces. This section contains 
examples to illustrate the error bounds. The literature provides some tools which can be 
applied to estimate the quantities of interest, e.g. A, /3. These tools are briefly introduced. 
For further details we refer to the literature. Note that the initial distributions of the Markov 
chains of the following examples are chosen to demonstrate the error bounds and not 
chosen to minimize the burn-in. 
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Bounded state spaces 



Suppose that the state space I? is a measurable subset of R"^. The o-algebra 2) is given 
by B{D). We say a transition kernel K has a transition density with respect to a positive 
measure /x if there is a non-negative function k: D x D ^ [0,oo], such that 

K{x,A)= / k{x,y)n{dy), x€D,AgB{D). 
J A 

We write fc" for the transition density of K"-. 

Let Dhe a bounded set and let the function g: D ^ [0,oo] be integrable with respect to 
the Lebesgue measure, with q{x) dx > 0. Then 

is a probability measure on {D, B{D)). We say g is an unnormalized density with respect 
to the Lebesgue measure if J^, g{x)dx =^ 1. Let -ftT be a transition kernel with transition 
density k with respect to the Lebesgue measure and assume that Wg is a stationary 
distribution of K. Furthermore, let s e [0, 1] and let us define 

Ksix, ^) = (1 - s)K{x, A) + s1a{x), xgD,Ag B{D). 

The transition kernel is called the s-modified transition kernel of K. If s = i then the 
lazy version of K is given and if s = then one has K. For all s e [0, 1] we have that -Kg 
is a stationary distribution of K^. The goal is to approximate 

S{f)= [ f{x)7:g{dx). 
Jd 

One obtains torn g N that 

=^s"-'(l-sy('"Vxx,A)+s"lA(x), xGD,AGBiD). (3.28) 



The case s = is reasonable if we define 0° = 1. The following lemma determines a 
condition which implies Li -exponential convergence of the s-modified transition kernel. 



Lemma 3.46. // tliere exist an a e [0, 1) and M < oo such that 



2s" + / ess sup y - sj f ^^^^ - (1 - s") 



g{x) dx < a"-M, n e N, 
then the transition kernel Ks is Li-exponentially convergent with (a, M). 
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Proof. The Markov operator of Kg is denoted by Pg. Then 



\{P:-S)f\\,= jj{y){j2s-\l~sy(^^k\x,y)dy) + s-f{x)-SU) 

^/Jj/(^)ii:-'<-»)-(:)^ 



q{x) Ax 



(1-.") 



dt/ da; 



+ s" / \f{x)-S{f)\Q{x)Ax 

JD 

< 11/11 1 / ess sup 

yeD VV 



k'{x,y) 



< ||/||i(2s"+ / esssup 
Jd yeD 

proves the assertion. 



e(y) 

n\ k''{x, y) 



g{x) dx 



i) Q{y) 



q{x) dx) 



□ 



For n = 1 and s = one has a criterion for Li-exponential convergence with (a, 1) for the 
transition kernel K. 



ess sup 

D yeD 



k{x,y) _ ^ 

eiv) 



Corollary 3.47. If there exists an a e [0, 1) such that 

q{x) dx < a, 

then the transition l<ernel K is Li-exponentially convergent with (a, 1). 
Example 1 



Let us present an easy example borrowed from |Ros951 p. 402]. Let D = [0, 1] and 
D = B{[0, 1]). The transition kernel is defined by 



K{x,A) 



ii^i^dy, xe [0,1], Ae6([0,l]). 



The stationary distribution is given by 

n{A)^^l^{x+^)dx, AeB{[0,l]). 

The transition kernel K is reversible with respect to tt. These properties can be checked 
straightforward. We have 



ess sup 

ye[o,i] 



k{x,y) 



1 



eiy) 

^ \x + y ~ 2xy — ^ 

ess sup o— 

yelo.i] 4(?/+§) 



g{x) dx 



ess sup 



x + y-2xy- ^\ 



2 I da; = 



ye[o,i] 2{y+l){x+l) 
1 



g{x) dx 





1 


f 




10 


^"2 



dx — 



24' 



Hence Corollary |3.47| qives that the transition kernel is Li-exponentially convergent with 
(1/24,1). Because of the reversibility one can apply Proposition 13.241 and has that the 
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transition kernel is n-a.e. uniformly ergodic with (1/24, 1/2). Furthermore there exists an 

i2-spectral gap, one has 13 < a = 1/24. 

Let 5 e (0, 2/3) and let the initial distribution v be given by 

1 



i^{A)^^ J^l[o^s]ix)dx, AeS([0,l]). 

Hence the initial state is chosen uniformly distributed in [0, 5]. Then 

4 • Mo,s]{x) 



dv 



ess sup 

x6[0,l] 



5{2x + i) 



Theorem [3.451 (1!) suggests the choice 



no = 



log(24) 



such that 



sup e^{Sn,nojY < TT^ + 



1152 



Il/ll2<l 



2371 52 9^2 



< 



Example 2 



It is taken from IRos03l p. 172]. Let D ^ [-1, 1] and 2) = S([-l, 1]). The transition kernel 
is defined by 



K{x, A) 



l[_i,o](2:)l(o,i](y) + 1(0,1] (a:)l[_i^o](y)dy, X e [-1, 1], A e S([-l, 1]). 



For X e [-1,0] the next state is uniformly distributed in (0,1] and for x e (0,1] the next 
state is uniformly distributed in [-1,0]. The transition kernel is reversible with respect to 
the uniform distribution on D, thus tt^ is given by g{x) = 1/2 for x e D. For n e N we 
have 



K{x,A), n odd, 
K'^{x,A), n even. 



if"(x,A) 
where 

K^x,A)= f l[_i,o](2;)lhi.o](y) + l(o.i](2;)l(o,i](2/)d2/, a; € [-1, 1], A G S([-l, 1]). 

J A 

The spectrum of P is completely known, one has spec(P|L2) = {1, 0,-1} with 
Eig(P, 1) = {/ e L2 I / = c, c e M} = (L°)^, 

Eig(P,0) = {/ei2 I / f{x)Ax^ [ f{x)Ax = Q), 
J -I Jo 

Eig(P,-l) = {/ e L2 I fix) = c(l[_i,o](x) - l^o.i]{x)), c e M}, 

where Eig(P, A) denotes the eigenspace of the eigenvalue A. Clearly spec(P|L^) = 
{0, -1}. To apply the error bounds one has to pass over to K, the lazy version of K. Let P 
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be the transition operator winicin corresponds to K. We denote i3 = (3j^ and A = Aj^ to in- 
dicate tine transition kernel K. We Inave spec(P|L2) = {l, |,0} and spec(P|L§) = {5,0}. 
Tine operator P has an L2-spectral gap, one obtains 



P 



Note that K = Ki. By the special structure of if" one obtains for x,y e D that 

1 (n\ k\x,y) ^ _l_ f {,-_^,)k{x^ y) + {l)k'{^, 2/), n odd 

e{y) ^''-'\EtoL^+i)H^,y) + Eti'&'{x,y), nevei 



= {k{x,y) + k^{x,y)) - 



k^{x,y) 



On-l 



= 1 - 



k'^{x,y) 

On-l ' 



It follows that 



/ ess sup 

I -I i;e[-l,l] 



ess sup 

-1 y6[-i,il 



1 " 

-Y 

On / ^ 



1+^ 



gi(a;)da; 



2" 



On-l 



- da; = — 
2 2" 



By Lemma l346] we get with s = 1/2 that the kernel K is ii-exponentially convergent with 
(1/2,3), i.e. 



pn _g 



< — , neN. 

Li^Li - 2" 



The parameter a = 1/2 of the ii-exponential convergence is optimal, since = 1/2 
and in general for reversible, Li-exponentially convergent transition kernel with (a, Af) 
one has p <a. 

Let 5 e (0, 1). Assume that the initial distribution is given by 

i^(A) = i^l[o,5](x)da;, Ae6([-l,l]), 

i.e. the initial state is chosen with respect to the uniform distribution in [0,5]. Then 

2 • l[o,5](a;) 



du 

diTn 



■■ ess sup 

a;G[-l,l] 



2 



Theorem r3.45l l|i) suggests the choice 



no 



log(3(| - 1)) 
log(2) 



such that for S'„,„o, which uses a Markov chain with transition kernel K and initial distri- 
bution V, one has 

sup e,(5„,„o, /) < J- + ^. (3.29) 



it/IU<i 
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By Remark |338l by the ii-exponential convergence of K with (1/2, 3) and Aj^ ^ (3j^ one 
obtains the lower bound 



3 16 

5" < sup e^(S'„. „(,,/). (3.30) 

" " Il/!l2<i 

By Corollarv l337l one has for all u g Eig(P, i) ^ Eig(P, 0) with ||u||2 ^ 1 that 
e^{S„,u)'^ ^ sup e^{Sn,fY= 1™ sup e^(S'„.„(,, /)^. 

Il/ll2<l "''^°°II/Il2<l 

This motivates the comparism ofjhe lower error bound, the upper error bound and the 
exact error for a specific u e Eig(P, \). Namely, let 



-1, a;e [-l,-i]U[0,i), 
1, a;e (-i,0]U(i,l]. 



u{x) — 

By = 1 we get 

Lj{u^) = and Lj{uP^u) = ^Lj{u^) = 0, for j, keN. 
Hence by Proposition 13. 29| one has 



3 4(1 — 2^") 

e„{Sn,no,u) = e^{Sn,u) ^ \l 5 . (3.31) 

V n 71 

In Figure [33] for S = 10^^ the exact error 1 13.31 1 , the upper error bound | |3.29| | and 
the lower bound l l3.30t are plotted. The lower bound leads to a non-trivial estimate if 
iV > Tio + 6 = 19. The curve of the upper error estimate is shifted down, because the 
coefficient of the leading term is worse than the coefficient of the leading term of the exact 
error e^(S'„,„„,w). 

Lemma 13.461 provides a tool which can be used to show Li-exponential convergence 
for several examples. Unfortunately it is rather difficult to apply for more sophisticated 
applications. Next let us present the Metropolis-Hastings algorithm. 

Metropolis-Hastings algoritlim 

The Metropolis-Hastings algorithm, suggested in lMRR+531 and extended in IHas70l , 
is widely used in applications. The following introduction is based on Mengersen and 
Tweedie [MT96]. Suppose that the state space D is contained in M'* and equipped with 
B{D). Let TTg be a probability measure on {D,B{D)) given by a possibly unnormalized 
density g with respect to the Lebesgue measure, one has 

f . p(x) dx 
7r,(A)^ j-^^ , A^B[D). 

Jd (?W 

Le\ q: D X D ^ [0,oo] be a function which satisfies that q{x, •) is integrable with respect 
to the Lebesgue measure for all a; e P* and assume that 

Q{x, A) = q{x, y) dy + 1a{x) li^, y) dy^ , xGD, Ae B{D), 
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10 



2 



10 



10 



lower bound (3.30) 

— exact error (3.31) 
- - upper bound (3.29) 



N = no+n 



Figure 3.5.: Example 2: Exact error and error bounds, 6 = 10~^ and no 

13. 



10 



log(3(|-l)) 



log(2) 



is a transition kernel. It might happen that for some x e D one has Q{x, {x}) > 0. If 
Q{x,{x}) = for a\\ X € D then g is a transition density of Q. The question is how to 
modify Q to get a transition kernel with stationary distribution -Kg. For x,y € D let 



e{x,y) 



1, 



g{x)q{x,y) = 0, 



be the acceptance probability. Then the Metropolis-Hastings transition l<ernel Kg is de- 
fined by 

Kg{x, A) = y) Q{x, dy) + 1a{x) (^jjl - 6{x, y)) Q{x, dy) 

= d{x, y)q{x, y) dy + 1a{x) (^j (1 - e{x, y))q{x, y) dy + Q{x, {x})^ , 

where x e D and A e B{D). In this setting Q is called the proposal transition iiernel of 
Kg. If q{x, y) = q{y, x) for all x,y € D, then we call Kg the Metropolis transition kernel. 
By the construction one can see that the transition kernel Kg is reversible with respect to 
TTg, thus one has the desired stationary distribution. 



Lemma 3.48. The Metropolis-Hastings transition kernel Kg is reversible with respect to 



■Ko- 
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Proof. It is enough to show that 



J A JB 



for disjoint A,Be B{D). Then the assertion follows by the symmetry 6{x, y)q{x, y)g{x) = 
9{y,x)q{y,x)g{y) and Fubini's Theorem. □ 

The Metropolis-Hastings algorithm, which simulates a transition of the Metropolis- 
Hastings transition kernel, works as follows: Let a; e -D be the current state. Choose 
a proposal state y with respect to Q{x, •). Toss a coin, whose probability that "head" oc- 
curs is 9{x, y). If it is "head" then accept th e proposal state, i.e. return y. Otherwise reject 
the proposal, i.e. return x. The Procedure |Metropolis-Step[ x, Q, g) returns the state y. 



Procedure Metropolis-Step(a;,(3,e) 

input : current state x, proposal kernel Q, unnormalized density g. 
output: next state y. 

Choose y with respect to Q{x, •); 
Compute 

[l, g{;x)q{x,y) =Q] 

if rand() > 6{x,y) thien 

I y ■= x; 
end 

Return y. 



If q{y) = q{x: y) for all x,y e D then the proposal transition kernel samples independently 
of X. In this situation one can apply the following result. 

Thieorem 3.49. Let q: D [0,oo] be a function witti Jjyq{x)dx = 1. Let ttie proposal 
transition kernel of the Metropolis-Hastings transition kernel Kg be Q{x, A) = q{y)dy 
forx e D and A e B{D). If there exists a 7 > such that 

-— >7, yeD, 

then Kg is uniformly ergodic. We obtain 

\\K^ix,-)-7Tl^<{l-jr, xeD,neN. 
Proof See IMT96I Theorem 2.1 , p. 1 05]. □ 

Remarl< 3.50. The proof is based on the well known equivalence that a transition kernel 
K is uniformly ergodic iff the whole state space £> is a small set. A set i? e B{D) is called 
small if there exists a 7 > 0, an rri e N and a probability measure ^ such that 

K"'{x, A) > ji^{A), xeR, Ae B{D). 
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The result of Theorem [3.491 will be demonstrated for a toy example, stated in I MT961 
p. 107]. 



Let D = M and D = B{R). Note that the state space is unbounded. The desired distribu- 
tion is given by the density 



£'(y) 



1 . y\ 

:exp(- — ), y€ 



/27r 



i.e. -Kg is an A^(0, 1) distribution. By N{^, we denote the normal distribution with mean 
fi and variance C^. Furthermore, assume that the proposal transition kernel samples 
independently from N{0,^'^) so that 



Let > 1. Then 



which implies that 



q{x) 



\Kix,-)-7rX<ii-r'y' 



X £ D, n £N. 



By the reversibility with respect to ng of the Metropolis-Hastings transition kernel an imme- 
diate consequence is that uniform ergodicity implies Li-exponential convergence, since 
TT-a.e. uniform ergodicity is equivalent to Li-exponential convergence. Hence we have a 
transition kernel which is Li-exponentially convergent with (1 - 1). This implies that 
the Markov operator P which corresponds to the transition kernel Kg has an L2-spectral 
gap, we have 1 - 13 > 

Let 5 e (0,1) and xo e [0,oo). The initial state is chosen uniformly distributed in [xo - 
6,xo + S]. Then 



an 



Mxo-S.xo+S] {x) 



exp 



— ], x£D. 



We obtain 



du 

c?7r 



- 1< 



TT_ exp 



The method Sn.no uses a Markov chain with transition kernel Kg and initial distribution u. 
The burn-in is almost chosen as suggested in Theorem [3.451 11!). We use log(l - ^"^) > 
^"^ to estimate the burn-in, such that we set 



no 



{xo + 5f 



0.23 



Then 



sup e^(S'„,„o,/)^ < — + 

Il/ll2<l " " 
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Contracting Normals 



The next example is described in [Bax05], see also [RR97b','RT99]. Let Z? = M, S = B{R) 
and 6 e (-1, 1). Note that the state space is unbounded. The transition kernel is given 
by 



K{x,A) 



1 



exp 



2(1-612) 



A e B{R), 



^2^(1 - 02) 

so that K{x, •) is an N{9x, 1 - 9^) distribution. By some elementary calculation one can 
see that a stationary distribution is 



niA) 



1 



exp ( -y ) dy, 



Ae 



i.e. vr is an A^(0,1) distribution. The transition kernel K is reversible with respect to 
TT. Suppose that 8 e (0,1). Then the Markov operator is positive semi-definite, i.e. 
{Pf,f) > 0, for all / e L2. The next result is an application of [Bax05, Theorem 1.3, 
p. 702] where the Markov operator is self-adjoint and positive semi-definite. The same 
example is considered in tBaxOS , p. 728] and tLNIIi p. 33]. 

Lemma 3.51 . Let9 e (0, 1), c e (1, 00) and set 

2(1-02) 



K = 2 

B^2 



1 + C2 



I) 



a 



1 



f{l + ff)c 

V\/r~~02 



- $ 



9c 



Vi^ 



1 v"^ 
wtiere $(z) = -j= / exp(-— )dy. 



log(A-i) ' 
/3 = max|A, (1-B)i/"| 

Then 



< 1. 



Proof. See IBaxOS! Theorem 1 .3, p. 702 and p. 728]. 



□ 



Let us illustrate the last lemma. For any fixed 6 one can numerically minimize the upper 
estimate /? of 13, depending on c. For example let 9 = 0.5. Then, one gets /3 = 0.8946 for 
c = 1.6041. 



It exists an L2-spectral gap, thus we can apply Theorem l3.45l for v e (2,00]. Let (5 e (0,1) 
and xq € [0,00). The initial state is chosen uniformly distributed on [xo - 6,xo + S]. The 
density of the initial distribution with respect to n is given by 

dn^'~y2 S PI2 



Then for all q £ [1, cxi] it follows that 



dv ^ 
dn 



< 



dv ^ 
dn 



^ exp(i^) ^ exp(i^) 

V2 6 - V2 ~S 
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The burn-in is cinosen as suggested in Tlieorem r3.45l winere we use tine previously stated 
estimate of \\^ ~ Suppose tinat tine burn-in no(p) is tine smallest natural number 
(including zero) which is greater than or equal to 



1 



2(p-2) 



log(^)+log(y2^ri) + i^J , pe(2,4), 



Then 



2 2 

sup e^{Sn,nojf < — ^ + — 

11/11 <i n(l-/3) n2(l-/3)2 



In Table 13.21 one can see how much resources N are sufficient to obtain an error less 
than e = 0.01. 



e 


c 




no 




n 




N 










(for p = 


2.1) 


(for precision e 


= 0.01) 






0.91 


1.12845 


0.999664 


2.82241 


• 10^ 


5.94614 


10^ 


5.97437 


10^ 


0.92 


1.11691 


0.999816 


5.16275 


• lO'^ 


1.08759 


108 


1.09275 


108 


0.93 


1.10499 


0.999912 


1.08257 


• 10^ 


2.28043 


108 


2.29126 


108 


0.94 


1.09260 


0.999966 


2.76738 


• 10^ 


5.82923 


108 


5.85690 


108 


0.95 


1.07964 


0.999990 


9.60536 


• 10*^ 


2.02337 


10^ 


2.03297 


10^ 


0.96 


1.06599 


0.999998 


5.58578 


•10^ 


1.17624- 


lOi" 


1.18183- 


IQio 



Table 3.2.: Contracting Normals: The initial distribution v is chosen with xq = and 
5 = 0.1. The burn-in of Theorem |3.45| is computed for p = 2.1 and n is computed such 
that one obtains an error less than e = 0.01. The estimate /3 of /3 is computed by a 
minimizing procedure of Maple for c > 1.01. 



3.5. Notes and remarks 

in the last decades explicit error bounds and confidence estimates of Markov chain Monte 
Carlo methods on general state spaces attracted more and more attention. In the follow- 
ing let us present how the results fit into the published literature. 

In the seminal work of Lovasz and Simonovits [ LS93I an estimate of Cj^iSn, /)^ is shown. 
The paper deals with the computation of the volume of a convex body by a randomized 
algorithm based on Markov chains. Let us explain the result of [LS93, Theorem 1.9, 
p. 375] in detail. Let (X„)„gN be a Markov chain with transition kernel K and initial 
distribution v and let K be reversible with respect to a probability measure n. Then let us 
define the conductance as 

f.K(x,A'')TT(dx) 
ipiK,Tr)= inf - 

0<n{A)<i Tt{A) 

Assume that the Markov operator is positive semi-definite, i.e. (P/, /) > for all / e L2. 
Then 

eASnjf< H/II2- (3.32) 
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The result is slightly worse than the result of Proposition 13.261 In Proposition 13.261 one 
has an exact error formula for e.„{Sn,fY- Mainly the spectral structure of the Markov 
operator is used. In Corollary [3. 27l this exact error formula is further estimated and one 
obtains 

e.{Snjf <j^^j-^\\.f\\l, Where A = sup {a | a e spGc(P|L°)} . (3.33) 

The Cheeger inequaliti^ given by 1 - A > vil^^iL^ provides a relation between A and 
ip{K, vr), SO that (I3.33t implies Il3.32t . Note that in Proposition [3.26l and Corollary l3^ it is 
not assumed that the Markov operator is positive semi-definite, such that the assumptions 
are slightly less restrictive. But if one has a transition kernel K which determines a not 
necessarily positive semi-definite transition operator, then one can pass over to the lazy 
version of K and obtains positive semi-definiteness. However, the estimate of (I3.32t cov- 
ers the important facts and it seems that the refinement of Proposition I3.26l is well known. 

The paper of Mathe [Mat99] contains results concerning the asymptotic integration error 
for uniformly ergodic Markov chains which are reversible with respect to tt. For example 
it is shown that for any initial distribution e Mac, one has 



lim n ■ sup e^{Sn,no,f) 

Il/ll2<l 



2 



1 + A 
1 - A 



and for / e i2 it is proven that 

lim n-e,{Sn,noJf - {(I ~ Pt^I + P)9, g) , where 5 = / - 5'(/). 

n— >oo 

The same result is part of Corollary [3.371 and for individual / part of Theorem [3.341 In 
[MaT04] the asymptotic integration error is studied for not necessarily reversible and not 
necessarily uniformly ergodic Markov chains. It is assumed that the transition kernel is 
y-uniformly ergodic, see (I3.36t . For further details let us refer to [Mat04]. 

In [Rud091 Theorem 8, p. 19] an explicit upper error bound of e^{Sn,no,fy for general 
state spaces is provided. The result is based on ILS931 Theorem 1 .9, p. 375] and the 
assumptions are the same. Namely, the transition kernel K is reversible with respect to 
TT and the transition operator P is positive semi-definite. After a burn-in 

logfll^ll ) inn 
no> theerrorobeys e,(5„,„„, < ||/||^ . (3.34) 

The proof of the result is based on Proposition 13.29! which provides the crucial relation 
between e^(5„,„(,, /)^ and e^(5'„,„o, By Theorem [3.41! and Theorem [3.45! one ob- 
tains a refined error estimate and a refined recipe for the choice of no. Note that positive 
semi-definiteness and reversibility is not needed in Theorem l3.41[ It is enough that there 
exists an L2-spectral gap, i.e. 1 - /3 > 0. 

Independently of [Rud09' Theorem 8, p. 19] in the work of Belloni and Chernozhukov 
[BC09. Theorem 3, p. 2031] a similar error bound for Sn,n„ is proven. It is also based 
on ILS93I Theorem 1 .9, p. 375] such that again the transition kernel is assumed to be 



The Cheeger inequality is stated in Section|Aj3 
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reversible with respect to tt and tine Markov operator must be positive semi-definite. TInen 
it is sinown tinat 



by [LS93. Corollary 1 .5, p. 372] that 



e.(5„,„o, /)' < e^iSn, f? + 8 11/11^ 

t(A)>0 tt(A) 



Let the initial distribution v be i?-warm, i.e. sup^gj,^(^WQ 4^ < R. Then one obtains 



Hence by ILS931 Theorem 1 .9, p. 375] one has 

eASn^n^jf < ml + sVr(i- ^P^^y 11/11^ . (3.35) 

The result of an explicit error bound for S'„,„„, when the initial distribution is not the sta- 
tionary one, is the same as in [Rud09, Theorem 8, p. 19]. Note that the burn-in depends 
on the desired precision. We can choose R = lls^llo^ and if one uses II/II2 < ll/lloo' 
then the upper bound of ll3.35li can be further estimated and one obtains an estimate 
with respect to ||-||^. 

Another result due to Latuszyhski and Niemiro is presented in [LN1 1]. The integration 
error for y-uniformly ergodic Markov chains is estimated, where V : D [l,oo) \s a drift 
function. The weighted class of functions 

Lv = Lv{D) ^\f:D^R\ \f\y ^ sup < 00 

is studied. Let a e [0, 1) and M < 00. A transition kernel K is called V -uniformly ergodic 
with {a, M) if 

II^"-5|Im.^Lv <^""' "^N. (3.36) 

One can substitute the drift function V by y^/'" for all r > 1. Then there exist an air) e 
[0, 1) and an M (r) < 00 such that 

IIP" - ^IL^^,,^^i^,,,„ < M{r)a{rr, n G N. 

This is justified by an interpolation argument in [Mat04J and by different assumptions 
stated in [Bax05J. Now let us state a less general version of the main result of [LNUI 
Theorem 3.1 , p. 28]. For r = 2 and .g = / - S{f) one has 

9\ (, 2M(2)a(2)\ M^a"" ||i/ 



where ||i^-7r||^^ = sup|g|^ <i \J^g{x){i^{dx) -7r(da;))|. This seems to be the first ex- 
plicit error bound of Sn,no for integrands / which belong to Ly. If the transition kernel is 
reversible, then y-uniform ergodicity with (a, M) is equivalent to the existence of an L2- 
spectral gap, see IRR97a|[RT011 . Furthermore if F e Lp for some p > 2 then Lv c Lp 
and the error bound of Theorem [3.41 l ean also be applied. However, in general Theo- 
rem |33T|cannot be used in this setting. 
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The paper of Joulin and Ollivier IJQ101 based on [OII09] follows a new idea. Let {D, dist) 
be a metric, complete, separable state space, with metric dist, and let K be a transition 
kernel with stationary distribution tt on {D,B{D)). Let 7'dist (£') be the set of probability 
measures fi on {D, B{D)) for which there exists an xo € I? such that dist(xo, y) A*(d?;) < 
oo. Then let us define the Wasserstein distance between ^i,/x2 e VdistiD) by 



Wi(^i,/i2)= inf / / dist(a;,y)^(d 



Id Jd 

where n(^i,/i2) is the set of probability measures £, on {D^,B{D^)) with marginals /xi 
and fi2- If there exists a k > such that 

Wi{Kix,-),K{y,-))<il-K)distix,y), x,y e D, (3.38) 

then we say that the transition kernel K has positive Ricci curvature k. Let the function 
/: L» ^ K be integrable with respect to tt and let 



sup 



l/(^)-/(y)l 



x,yeiXx^y dist{x,y) 
The coarse diffusion constant a{x) for x e I? of the transition kernel is defined by 

<xf ^\ I I disi{y,zf K{x,Ay)K{xAz). 
^ Jd Jd 

and the iocai dimension for x e -D is defined by 

2a{xf 



inf 



"^"-p=i !d Id \f(y) - /(^)r K{x, dz)K{x, dy) 

If the transition kernel has positive Ricci curvature, then by IJO101 Proposition 1 , p. 2423, 
and Theorem 2, p. 2424] one obtains that 



+ ^-±2 ll/llLp / dist(x,y)if(x,dy) 

K n \J d 

The estimate is reasonable for any deterministic initi al stat e x e D, the initial distribution is 
Sx- For further estimates and details let us refer to IJQ101 . Let p e (2, oo], let ||/||Ljp < oo 
and assume that there exists an xo e D such that ||dist(-, xo)!!^ < oo then one obtains 
/ e Lp, in particular 

||/||,<2^(||/|lLip||dist(,x„)|l^ + |/M|). 

If the transition kernel is reversible with respect to n and ||cr||2 < oo, then one can show 
that a positive Ricci curvature k > K implies an L2-spectral gap of P, it follows that 
1-/3 > K, see [OII09, Proposition 30, p. 831]. In this setting Theorem [3.41 l ean be applied 
when the initial distribution ly belongs to M^^^^2 p j ■ 

A regenerative Markov chain Monte Carlo algorithm for the approximation of S{f ) is stud- 
ied in ILMN091 . Roughly spoken, if one has certain information of a small set, then one 
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can explicitly estimate the mean square error of this regenerative estimator for uniformly 
and y-uniformly ergodic Markov chains, see [LMN09] for details. 

The literature provides also confidence estimates for 5„,„„. One can apply Lemma IZ271 
if an upper bound of ei.(S'„,„„, /)^ is available. These estimates can be boosted by a 
median trick explained in [NP09J and applied in [LN11J. However, exponential inequali- 
ties such as Hoeffding or Chernoff bounds for Markov chain Monte Carlo are better, see 
|Kru981 [Lez01 1 IGO02., .J01 0. .Mia1 0.1 . Asymptotic confidence estimates are discussed in 

Let us provide a conclusion. There are different explicit error bounds of the mean square 
error for Sn,no on general state spaces. In some situations these estimates could be 
improved. It seems that the error bound with respect to II II2 is not known so far. Let us 
recall that we assumed that the used Markov chain is ii-exponentially convergent and 
reversible with respect to n. If we only assume that the Markov chain has an L2-spectral 
gap, then we showed an estimate of the error uniformly with respect to ||-||p forp e (2, 00]. 
Upper error bounds with respect to ||-||^ are known but with respect to \\-\\^ seem to be 
new. In this setting it is not assumed that the Markov chain is reversible with respect 
to TT, we require hat n is the stationary distribution. The suggestion of the burn-in no 
of Theorem 13.45! performs well and also appears to be new. All error bounds hold for 
bounded and unbounded state spaces whenever estimates of the crucial parameters, for 
example A, /3 or (a, M), are available. 
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4. Applications 



In numerous applications one wants to compute for L» c M'* an integral of the form 

f{x) ■ cg{x)dx, (4.1) 



ID 

with density eg, where the number c is unknown. Of course c can be defined by 

- — g(x) dx. 
c Jd 

However, it is desirable to have algorithms that are able to compute l|4.H without any pre- 
computation of c. Let J^iD) be a class of tuples of the form (/, g), where g: D ^ [0, oo) 
is a possibly unnormalized density with g{x) da; > and for / we assume that f ■ gis 
integrable with respect to the Lebesgue measure. Then the goal is to compute 

for {f,g)eHD). (4.2) 
The solution operator S is linear in / but not in g. Hence is a nonlinear functional. 

We assume that there are two procedures, Or/ and Or^, which provide information of / 
and g, respectively. These procedures are considered as "black boxes" and we call them 
oracles. Let the oracle Or/ be a procedure which returns for an input a; e L» the function 
value f{x), i.e. Or/(a;) = f{x). Unless stated otherwise we also assume that the oracle 
Org provides for x e D the function value of g{x), i.e. Org{x) = g{x). We assume that 
the cost of an oracle call is much more expensive than the cost of arithmetic operations. 
Hence we count the total number of oracle calls which are needed to approximate S{f, g). 

Let Alg„ be the class of all randomized algorithms which at most use n calls of the oracle 
Or/ and n calls of the oracle Or^,. More precisely A„ e Alg„ is a mapping described by a 
function ip2n ■■ K^" K such that 

Mf, q) = ¥'2„(0r/(Xi), . . . , Or/(X„), Or,(Xi), . . . , Or,(X„)). 

The sample {Xi,...,Xn) e £>" is determined as follows: Let uj = (wi,...,a;„) be a 
random element with some distribution W. Then 

X, = X,(Or/(Xi), . . . ,0r/(X,_i),0r,(Xi), . . . ,Org{X,^i),oj,), i^2,...,n. 

The individual error of An e Alg„ applied to (/, g) e J^iD) is, as in the previous chapters, 
measured in the mean square sense, such that 

e(A„, (/, g)) - (E \Sif, g) ~ A„(/, g)ffl^, 

where the expectation is taken with respect to W . The overall error on J^{D) is 

e{A„,T{D))= sup e{An,{f,g)). 
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The complexity of the problem HjA.Z) on T{D) is given by 



comp(£,d, J'(L')) = mm{n \ there exists An e Alg„ with e(yl„, J'(L))) < e}. 

Note that d is the dimension of the domain D. We want to quantify the complexity of a 
problem with respect to the dimension d. The integration problem ^4.2) for the class J^{D) 
is called polynomially tractable if there exist non-negative numbers c,qi and q2 such that 

comp(e, d, F{D)) < ce^'^d'^ for all d G N, e G (0, 1). 

Roughly spoken it says that the complexity for computing i|4.2| increases at most polyno- 
mially in the precision e^^ and the dimension d. For details of the concept of tractability 
let us refer to Novak and Wozniakowski INWOBIlNWTOl . 

Let us provide a result which motivates an additional term of tractability. We consider the 
following class of functions 

Fc{D) = {U,g)\\\fU<l. ^^<C}. 

In some applications C can be very large, such as C = 10^°. Observe that always 
S{Fc{D)) = [-1, 1], hence the problem is scaled properly. In [MN07j Mathe and Novak 
proved a lower error bound, see IMN071 Theorem 1 , p. 678]. 

Theorem 4.1. For any An e Alg„ one obtains 

For an upper error bound Mathe and Novak consider the simple Monte Carlo algorithm: 
Evaluate the enumerator and denominator on a common independent sample according 
to the uniform distribution, say {Xi,X2, . . . , Xn) e -D", and compute 

E" 1 f{X^)g(Xi) 



Note that every Xj is uniformly distributed. It is essential that one can sample with re- 
spect to the uniform distribution on D. This might be a restrictive assumption. In I MNOTl 
Theorem 2, p. 680] the following upper error bound is proven. 

Theorem 4.2. For a// n e N we have 



e{Ai^^'^,Fc{D)) < 2min |l, 



From Theorem l4n and Theorem 14^2] one obtains that the complexity comp{e,d,J^ciD)) 
of l|4.2| is linear in C and Af^^^" is almost optimal, for all e e (0, ^) it follows that 

OmCe-^ < comp{s,d,TciD)) < 8Ce"^ 

Hence all algorithms are bad if C = 10^°. Mathe and Novak suggest to consider a smaller 
class of densities. The main goal is to have also tractability with respect to C on a class of 
functions, say Tc{D), where the possibly unnormalized densities satisfy 4^ < C. More 
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precisely, the integration problem i|4.2t is called tractable also with respect to C if there 
exist non-negative numbers c,qi,q2 and ^3 such that 

comp{e, d,Tc{D)) < c e'"' d''^ [log C]'^^ (4.3) 

for all e e (0, 1), d e N and C > 1, see INW10. p. 541]. 

With Markov chain Monte Carlo algorithms one can achieve this goal on certain classes of 
functions. Let (X„)„gN be a Markov chain with transition kernel K and initial distribution 
v. Assume that the transition kernel has stationary distribution tt^, where 

'^e(^)-T^4^' A^B{D), so that S{f,g)= f fix) TT.idx). 

Under suitable assumption on the Markov chain and on (/, g) e ?c{D) one has that the 
algorithm 

1 " 

Sn.noif, £*) = ~ X/ f^-^J+'^o) 
i=l 

is an approximation of S{f, g). Suppose that for each step of the Markov chain we use a 
single oracle call of Or^,. Then it follows that Sn^o needs n + no oracle calls of Or^, and n 
oracle calls of Or/. Consequently Sn,no e Alg„_^„^^. 



4.1. Integration with respect to log-concave densities 

Let r > and let B{x, r) be the d-dimensional Euclidean ball with radius r around x e M''. 
Furthermore let B'^ = 5(0, 1) and ri?'^ = 5(0, r). The goal is to compute 

f Drf f(x)g(x) dx 

g(/,g)- -^'^^; , ,^ , (4.4) 

J^Bd g[x) dx 

for (/, g) which belong to a certain class of functions. Let us define the class of functions 
on a convex body D c R'' rather than on rB'^. We assume that the state space D is 
equipped with the Borel o-algebra B{D). We consider functions (/, g) with the following 
properties: 

• Let g be strictly positive and log-concave, i.e. for all x,y e D and < A < 1 one has 

g{Xx + (1 - X)y) > gix)^ ■ g{yy-\ 

• The logarithm of g is Lipschitz continuous, i.e. there exists an L > such that 

|log£»(a;) - log g{y)\ < L ||a; - y\\^ , x,y e D, 
where H H^ denotes the Euclidean norm. 

• The integrand / satisfies < 1. 

For D = rB'^ one obtains that 4^ < e^^"". Hence C = e^'-'" and to have tractability also 
with respect to C, see (14. 3> , the goal is to show an error bound which depends polynomial 
on Lr. In general one has the following classes of functions 



J^^{D) = [{f,g)\gen'^{D), ||/||j, < l} , 
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where 

n^{D) ^{g>0\g\s log-concave, \log gix) - log g{y)\ <L||x-y||E}. 

The idea is to apply the Metropolis algorithm to obtain a Markov chain with stationary dis- 
tribution TTg, see Section US) The proposal transition kernel on {D,B{D)) is given by the 
ball walk. This random walk is used in [MN07, Rud09J and studied in different references 
of volume computation, see e.g. |LS93l|Ve"m051 . 



The transition kernel of the 5 ball walk is given by 

where 6 > and vo\d{A) denotes the Lebesgue measure of A e B{D). Schemati- 
cally, a single step of the 5 ball walk from state x may be viewed as in the proce- 
dure lBall-Walkl a:.(5). 



Procedure Ball-Walk(x, 6) 

input : current state x, radius 6. 
output: next state y. 

Choose y uniformly distributed in B{x, S); 
y e D then 
I Return y, 
else 

I Return x; 
end 



Let us state some well known properties. 

Lemma 4.3. The transition kernel Qs is reversible with respect to the uniform distribution 
onD. 

Proof. See IIMN07I Proposition 1 , p. 685]. □ 
The local conductance of the ball walk is defined by 

.oUiBix5)r.D) 

We call I a lower bound of the local conductance, if l{x) > I for all x e D. Note that / 
might be very small. For D = [0,1]'', the d-dimensional unit cube, one obtains even for 
small S that / = 2^'^. However, one can show for D = rB*^ and 6 < r/^d + 1 that I = 0.3 
is a lower bound of the local conductance. 

Lemma 4.4. Let Qs be the transition kernel of the ball walk on D = rB'^ for r > 0. If 
5 < r/^/d + 1, then I = 0.3 is a lower bound of the local conductance of the ball walk. 

Proof The assertion follows by the same arguments as in [MN07, Lemma 7, p. 687], see 
also [Rud07]. The only difference is that rB'^ is a ball with radius r instead of being the 
unit ball. □ 
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The Metropolis transition kernel based on the S ball walk is 

Kg,s{x, ^(2^' y) Qsix, dy) + 1a{x) l^j Jl ~ e{x, y)) Q&{x, dy) j , 

where the acceptance probability is 9{x,y) = min|l,|||j| for x,y e D and A e B{D). 

The lazy version of Kq_s is denoted by Kg^s- The transition kernel Kg^s is reversible with 
respect to Hg. In Algorithm [T| we present the integration algorithm Sf^ „^ which uses the 
lazy version of the Metropolis transition kernel with proposal transition kernel Qs- 



Algorithm 1 : 5^ 



input : n, no, 5, {f,Q). 
output: 

Choose Xi uniformly distributed in D; 
for = 1 to n + no do 
if randO > 0.5 then 

I Xk+i := Xk', 

else 

Y := IBall-Walkr x.. S); 
if giY)/g{X,) > randO then 
I Xi^i :— Y; 

else 

I Xi-^-l :— Xi', 

end 
end 
end 

Compute 

n 



it is convenient to use the notation Pk = P, (3k = P and Aa- = A to indicate the transition 
kernel K. The following lemma provides a lower bound of the L2-spectral gap of ^. 
The lemma follows from a result of Mathe and Novak presented in IMN07. Theorem 4, 
p. 690], where an estimate of the conductance of Kg^s is shown. 

Proposition 4.5. For r > let D aW^ be a convex body with 



diam(£') = sup {||a: — | y G D} < 2r. 



Let I be a lower bound of the local conductance of the 6 ball walk. Then, for all g eTZ^{D) 
one has for the lazy version of the Metropolis transition kernel based on a 5 ball walk, 
given by Kg^s, that 



1 — > min < ttj- -, 1 

^^11.^- 256 \8r2(d+l)' 
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Proof. One has /3jf ^ = Ajf ^ = + A^r^ J. The conductance of Kg,s is defined by 



(/j(i4:g,5,7rg) = inf 



0<7r,(A)<i 7rg(A) 



One can use the Cheeger inequality, see Proposition IA.7I It states that 



1~Ak , > 

Altogether one obtains 



1 4(1 -A.,, J >^^^^. (4.5) 
In IMN07I Theorem 4, p. 690] it is shown that 

^ ^e-L* , ( [¥ 16 



r\/d+l 

This lower bound is plugged into l l4.5t and the assertion is proven. □ 

In the previous result one can see that the lower bound of the local conductance is crucial. 
This motivates that we consider D = rB'^, since by Lemma|43]a lower bound of the local 
conductance is provided. An immediate consequence of the last proposition follows. 

Corollary 4.6. For r > let D = rB"^, assume tfiat g e n^irB'^) and cfioose 6* = 
min I i , ^= I . Then we have 

im-iQ-^ . r 1 1 



1 — (3f^ > — - — min 



J , 111111 \ o . . 

- d+1 [r'^L^ d+1 

Proof. The assertion is implied by Proposition 14. 5l and Lemma lT?! □ 

In particular one obtains that the lazy version of the ball walk has an L2-spectral gap, 
since one can consider constant densities where L = 0. 

Corollary 4.7. Forr > let D = rB'^ and let S = Then the lazy version Qs of the 

transition kernel of the ball walk obeys 

1.69 • 10-6 

1 - /^o > 



Now we can apply the error bounds of Section 13.21 The next theorem states an error 
bound for Sf;^,Jf, g) where (/, g) e T];{rBd). 

Theorem 4.8. Forr > let D ^ rB'^ and let v be the uniform distribution on rB'^. Let 
g e TZ^{rB'^) and S* = min |i, ^/^|. Let (X„)„gN be a Markov chain with transition 

kernel Kg S' and initial distribution i^. The approximation of S{f, g) is 

n 

<„o(/,^>) = -E/(^■'+"o)• 
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Forp £ {2,00] recall that 

-L 



J^;;{rB'') = {{f,g)\gen^{rB''), 11/11, <l} 



Let no{p) be the smallest natural number (including zero) greater than or equal to 

. 32p N 
' p-2 ' 



5.92- 10^ d+ 1 max|r2L^^^+ 1) • <^ V &p-2^ ^ v , 

l2Lr + 4.16, pe[4,cx)]. 



Then 



1089 



e(^l„o(p)'-^p(^^')) =^ ^Vd+T^lax{rL,^/dTT} 

8.38 • 10^ 



(d+l)niax{H h\{d + l)}. 



Proof. The initial distribution obeys 

void (A) 1 



vold(rB^) vold(rB'i) 7^ q{x) 
Since log£» is Lipsclnitz continuous witin Lipsclnitz constant L we obtain 



e < , , < e , x,y £ rB , 



SO tinat 



dv 



- 1 



< 



q{x) 
dv 



d-Kr, 



- 1 



By Corollary |4?6] we have the crucial lower bound for the spectral gap 1 - 
consequently Theorem [3.45 I iIm) can be applied which proves the assertion. 



and 
□ 



Note that p e (2, 00] is necessary to apply Theorem [3.451 (111). An essential consequence 
of the last theorem is the following result concerning the tractability of 143). 

Theorem 4.9. For the integration problem S{f, g) defined over F^irB'^) with r > and 
p>2we have 

comp(e,d, J"^(rB'')) < (rf + 1) max {r^ l?,d+l} 



1.2 • 10*^ • <^ 

|2Lr + 4.16, 



£_(Lr- + 0.51og|2|), pe(2,4) 



p e [4, 00] 
for all e e (0,1) andde N. 

The last theorem states that the problem i|4.4t is polynomially tractable. Roughly spoken, 
for fixed p one obtains 

comp(£,(i, J"p (rS'')) -< dmax{r^L^,d} (e^^ + Lr), 

SO that the dependence on L, on the precision e, dimension d and r is polynomial. We 
have tractability also with respect to C = e^'''^, inequality ll4.3t holds with gi = 2, 92 = 2 
and 93 = 3. For p e [4,cx)] the complexity can be bounded independently of p. If p 
converges to 2 the result is restrictive. However, for fixed p e (2, 00] we showed that the 
integration problem on J^p{rB'^) is polynomially tractable in the sense of | |4.3I >. 
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4.2. Integration over a convex body 

The goal is to compute 

5(/,A) = -^/^/(.)dx, (4.6) 

with A c M"*. In other words, S{f,A) is the expectation of / with respect to the uniform 
distribution, say ^ia, on A c M"*. The domain A and the function / are the input quantities. 
It fits in the class of problems described by if we assume that Aa D. Then might 
be considered as given by a density which is an indicator function. 

For some domains A it is indeed simple to generate uniformly distributed random points, 
e.g. the Euclidean unit ball or the unit cube. Then one can approximate S{f, A) by Monte 
Carlo methods with an i.i.d. sample. However, here A is part of the input to the al- 
gorithm, thus the problem S{f,A) shall be solved uniformly for a class of state spaces, 
where we do not assume that sampling with respect to the uniform distribution is possible. 

Let r > 1 and let 

Sd{r) = {AcR"^ convex | B'^ c A c rB"^} . 

U A £ Sd{r) then A is a convex bounded set with non-empty interior which contains the 
origin. The class of input parameters is given by 

Tp{r,d) = {if,A)\\\f\\.^<l, AeSdir)}. 

We assume that for any A e Sd{r) there exists an oracle Or^(£) which returns for an 
arbitrary line £ a uniformly distributed random point on Ar\£. 

Let us comment this assumption, since it seems restrictive. Assume that we have a mem- 
bership oracle of A e 5d(r) which is given by Or^(.x) = 1^(2:) for any x e rB'^. The oracle 
Or^ can be implemented by using the membership oracle. Let [x, y] = {x + ty \ t e [0, 1]} 
be the segment of y e with Euclidean distance - y\\^. By the convexity of A it fol- 
lows that Ane is a single segment, hence there exist ai, 02 e M'' such that [ai, 02] = Ani. 
Suppose that £ = {x + tdir \ t eR} with x g A and assume that there is a positive num- 
ber eo such that ||ai - a2\\^ > eo- We use that A e Sdir) and x e A. By a bisection 

method one can find with at most 31og(f^) + 2 calls of the membership oracle Or^, a 
segment [61, 62] with 61, 62 e R"^ and [ai, 02] c [61, 62] such that 

^ 11^1 - hWE < - «2|Ie < 11^1 - ^2|Ie ■ 

Then, choose a uniformly distributed random point in [61,62] and accept it, if it is in A, 
otherwise reject it and repeat the acceptance rejection procedure. This procedure gives 
a uniformly distributed random point in An£ and works reasonably fast, since the accep- 
tance probability is 1/6. Altogether an oracle call of Or^i requires at most an expected 
number of 3 log(2^) + 8 oracle calls of Or 4. In the analysis of the error we count the calls 
of the oracle Or^ and the function evaluations of /, i.e. the calls of the oracle Or/. 

Now let us provide a Markov chain on the measurable space {A,B{A)) with stationary 
distribution ^a- We consider the classical hit-and-run algorithm, also called hypersphere 
directions algorithm, see [Smi841 . The algorithm is studied and analyzed in |Lov991[O706l . 
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The work of Vempala F VemOSI provides an introduction to geometric random walks. 



Tine algoritlnm is as follows. Suppose that the current position \s Xi e A with i e N. 
Then choose a uniformly distributed direction, say dirj, and consider the line which is 
defined by = {X, + td\r^ \teR}. Apply Or^ (£(''), which gives the next state X,+i 
chosen uniformly distributed in ^^'^ n A. Then, again, a uniformly distributed direction, 
say dir,;+i, is generated and the next state is chosen uniformly distributed on n 
A by OrAii^'^^^)- Two consecutive steps of the hit-and-run algorithm are illustrated in 
Figure 14.1 1 Recall that the Euclidean unit ball is denoted by B"' and its boundary is 



Figure 4.1 .: Illustration of the generation of X^ and X2 by the hit-and-run algorithm given 

state Xi. 



denoted by dB'^. Schematically, a single step of the hit-and-run algorithm from x £ A 
may be viewed as in the procedure IHit-and-RunT a:). 

Procedure Hit-and-Run(a;) 

input : current state x. 
output: next state y. 

Choose a direction dir uniformly distributed on OB'^; 
Choose y uniformly distributed on 



The transition kernel of the hit-and-run algorithm follows. For any x,y eM.'^ let 



Since A is convex, M{x,y) is an interval. Let 

Ai (a;, y) = min {a I a S lnt(a;, y)} and X2{x,y) — max{a \ a G \n[{x,y)} , 

which implies that \n\{x,y) = [\i{x,y),X2{x,y)]. The length of the chord \n\{x,y) is given 
by e{x,y) X2{x,y) - \i{x,y). Let U{x,y) be a uniformly distributed random variable in 
the interval lnt(2:, y). Then the hit-and-run transition kernel H of the hit-and-run algorithm 




An{x + td\r I < e M}; 



Return y. 
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IS 

/gg, Pr[a; + U{x, x + 9)0 e C] d9 



^ro\d-li^Bd) Job" Jxi(x,x+e) eix,x + e) ^^'^^ 
1 f /•^^(^^^+'') lc(a; + A6') 



-■o icix + xe) 



vold-i{dBd) Jgg^ 7o £{x, x + 0) 

2 /■ Idy 



dXdO 



wo\d-i{dBd) Jc i{x,y)\\x-y\\t 



d-l 



(4.7) 



where x ^ A and C e The last equality follows by the integral transformation 

formula 

/ f{v)<^v= I r fig{x,e))\''-'d\de 

JR"* JdB-' Jo 

with 

^ *(.,rtiix-.iir' 

and either g{X, 9) ^ x + X6 or g{X, 9) = x - X9. 

Lemma 4.10. The hit-and-run transition l<ernel H, given by (|4.7) . is reversible with re- 
spect to fj,A on A. 

Proof. Let k{x, y) be a symmetric transition density of a transition kernel K, i.e. k{x, y) = 
k{y, x) for all x,y e A. Then it follows by Fubini's theorem that 



K{x,C) ^.A{dx) = I I k{x,y) ^A{dy) ^Aidx) = / / k{x,y) fiA^dx) ^A^dy) 

JbJc JCJB 

= / / k{y,x)^iA{dx)fiA{dy)^ [ K{x, B) fiA{dx), B,CeB{A). 

JCJB JC 

Hence the transition kernel K is reversible with respect to ^a- Since i{x,y) = i{y,x), 
one obtains that the transition kernel H has a symmetric density and this implies that it is 
reversible with respect to ^a- □ 

The lazy version of H is denoted by H. In Algorithm|2]we present the integration algorithm 
Sn%o which uses the lazy version of the hit-and-run transition kernel. We use the notation 
Pk ^ P, Pk = P and A^- = A to indicate the transition kernel K. The following lemma 
provides a lower bound of the L2-spectral gap of Pjj. The lemma is a straightforward 
implication of a result of Lovasz and Vempala presented in ILV06I Theorem 4.2, p. 993]. 
Lovasz and Vempala show an estimate of the conductance of H. 

Proposition 4.1 1 . Let r >l. Then, for all A e Sd{r) one has for the lazy version of the 
hit-and-run transition kernel, given by H, that 

1-/3^ > 2-^^{dr)-^. 
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Algorithm 2: S^,%^ 


input : n, no, if. A). 
output: 5r.%o(/,^)- 




choose Xi uniformly distributed in B'^ 
for fc = 1 to n + no do 
if randO > 0.5 then 

1 Xk+1 := Xk', 

else 

1 :=IHit-and-Runrx.): 
end 
end 

Compute 






1 " 

i—1 



Proof. In 1 13706] Theorem 4.2, p. 993] it is proven that 

Then the proof follows by the same arguments as the proof of Lemma l43] □ 
Now we can apply the error bounds of Section [T2] and obtain the following. 

Theorem 4.12. Let v be the uniform distribution on B'^. Let {Xn)nen be a l\/larl<ov chain 
with transition kernel H and initial distribution v. The approximation of S{f, A) is 

1 " 

5i'rno(/'^) = -E/(^^+"o)- 

' " n ^ — ' 

For r>\ and p> 2 recall that 

F,{r,d)^[{f,A) I ||/||p< 1, AeSdir)]. 

Let noip) be the smallest natural number (including zero) greater than or equal to 

ir 9 , ( ,^r^id^ogr + \og^), pe(2,4), 
4.51 • 10 d r • < "^^^ ' 6p-2^' V ' 



dlogr + 4.16, p £ [4, oo]. 



Then 



fj2 2 



Proof Note that the initial distribution ly is well defined, since for A e Sd{r) one has 
B'' c A c rB"^. Furthermore, it follows that 
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One obtains 



dv 



dfiA 



- 1 



< 



dh' 



dfiA 



- 1 



< 

rG [1,00" 



VOlrf(B'i) 



By Lemma |4jT|we have the crucial lower bound for the spectral gap I - (3^ and conse- 
quently Theorem [335] © can be applied. Hence the assertion is proven. □ 

Note that p > 2 is necessary to apply Theorem 13.45 I iIm). A consequence of the last 
theorem is the following result concerning the tractability of the integration problem ll4.6t . 

Theorem 4.13. For the integration problem S{f, A) defined over Fp{r, d) witli r > 1 and 
p> 2 we liave 



conip(e, J^p(r, d)) < d^r^ 



4- lO^^e^^ 



5- 10^5 



2(^(dlogr + log|^), pe(2,4) 
(ilogr + 4.16, p € [4, 00] 



for all e e (0,1) and d e N. 

The last theorem states that | |4.6| | is polynomially tractable. Roughly spoken for fixed p 
one obtains 

conip(e, J^p(r, d)) ^ d^r^(e^^ + dlog r), 

SO that the dependence on the precision e, dimension dand r is polynomial. Forp e [4, 00] 
the complexity can be bounded independently of p. If p converges to 2 the result is re- 
strictive. However, for fixed p > 2 we showed that the integration problem is polynomially 
tractable on Fd{r,p). 



4.3. Notes and remarks 

Let us briefly summarize the features of the last sections and provide additional results 
of the literature. In Section |4~TI elementary state spaces were considered, namely balls, 
and the distribution Hg determined by g could be complicated. In Section 14^2) the distri- 
bution of interest was simple, namely the uniform one, and the state space was possibly 
complicated. 

The problem of integration (|4.1 1 . stated in the form 

fn f{x)g(x) dx 
Jd S{x)dx 

is formulated as in the work of Mathe and Novak IMN07 1. There the authors also proved 
an asymptotic error bound of the Metropolis algorithm based on the ball walk on 
They studied the algorithm Sf^'„ and for S* = min {(d + 1)^^/^, L"^} it is shown in I MNOTl 
Theorem 5, p. 693] that 

lim n- e{S^^\,T2{B'^)f < 594700 • (d + 1) max {d + 1, . 

The first non-asymptotic error bound is proven in [Rud09] for the class F^{B'^). It states 
that for no > 1.28 • 10^ • L(d + 1) max {d + 1, L^} the error obeys 

e«„o,-^^(S')) < ^VdTTmax{VdTT,L} . 
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Theorem \4~8\ extends the result. The integrands / belong to Lp where p > 2 and we 
considered the domain rB"^. The constants in the error bound are of the same magnitude 
and the dependence on the dimension d, the Lipschitz constant L and the precision e is 
the same. The problem is tractable in the sense of (14. 3t . 

Apart of the asymptotic result of [MN07, Theorem 5, p. 693] it is always assumed that the 
integrand / belongs to Lp for p > 2. The case of / e L2 is not covered so far. To apply 
Theorem |3.34| it is sufficient to have a transition kernel which is reversible with respect to 
the desired distribution and uniformly ergodic with {a,M). It is well known that the ball 
walk, the Metropolis algorithm based on the ball walk and the hit-and-run algorithm are 
uniformly ergodic, see [Smi84, KS98, MN07]. However, as far as we know there is no 
estimate of the numbers a e [0,1) and M < 00, of the uniform ergodicity with (a,Af), 
to obtain polynomial tractability. We get polynomial tractability if there exist non-negative 
numbers c and q, such that (1 - a)^^ < cd''. One can prove the following. Let D ^ B*^ 
and 5 = 2/\/d + 1. Then the ball walk Qs is uniformly ergodic with (a, M), where 



Unfortunately the crucial quantity (l - a) ^ is exponentially bad in d. Hence, this is not 
enough to prove polynomial tractability. It is not clear if one can get a significantly better a. 

The hit-and-run algorithm is studied in different references of volume computation and 
optimization. However, as far as we know it was not yet applied to integration problems of 
the form of ll4.6ll . There is an immediate generalization of the hit-and-run algorithm which 
can be used to sample a distribution given by a log-concave density, see for example 
[LV06I p. 987]. This might be used to obtain further error bounds for other classes of 
functions. 



0.15 



and M = 100. 



a = 1 
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A. Appendix 



Some aspects of Functional Analysis are fundamental for the understanding of the error 
of Markov chain Monte Carlo. We present the Spectral Theorem for linear, bounded and 
self-adjoint operators. Then we state the Interpolation Theorem of Riesz-Thorin for oper- 
ators acting on Lp. Afterwards the conductance and the Cheeger inequality is introduced. 

A.1 . Spectral Theorem 

We state the Spectral Theorem for linear, self-adjoint and bounded operators. For further 
reading, proofs and details we refer to I KG 82, IRud91 1 ITri921 . For an introduction see 
[Kre89]. 

Let H be a real or complex Hilbert space and let C{H) be the space of all linear and 
bounded operators mapping from H to H. Let i3(M) be the Borel a-algebra over R. 

Definition A.I (spectral measure). A spectral measure or a projection-valued measure 
is a mapping E: B{R) C{H) with the following properties: 

(i) for all A e B{R) the operator Ea is an orthogonal projection, 

(ii) i?0 = 0, i?R = /, where / is the identity, 

(ill) for pairwise disjoint Ai, A2, • • • G B{R) we have for any g £ H that 

CO 

i=l 

If there exists a compact set K e B{R) with Ek = I, then we say that the spectral 
measure has compact support. 

For f,g e H a signed measure is defined on (M, B(R)) by 

uj{A)^{EAf,9), AeB{U). 

If / = g, then the measure w is non-negative. Let P e C{H) be a self-adjoint operator 
and let us denote the spectrum 0^ P: H ^ Hby spec{P\H). Furthermore let 

A= inf (Pg,g) and A= sup {Pg,g). 

Il9ll=l ||g||=l 

The spectrum of P is closed and spec{P\H) c [A, A]. Additionally one has A,A e 

spec(P|iJ), thus 

A = inf {a I a e spec(P|iJ)} and A = sup {a | a G spec(P|iJ)} . 

Now we state the Spectral Theorem for linear bounded self-adjoint operators. It is an 
analogon to the finite dimensional Spectral Theorem for matrices. 
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Proposition A.2 (Spectral Theorem). Let P e C{H) be self-adjoint and fc e N. Then 
there exists a uniquely determined spectral measure E with compact support spec (P| if) 
such that 

(P'Lg)^ a'^d{E[^yf,g), f,g€H. (A.1) 

Let F : [A, A] ^ R be a continuous function. Then one has by the continuous functional 
calculus a self-adjoint operator F{P) e C{H) with 

{F{P)f,g)^ Fia)d{E^^}f,g) f,geH, (A.2) 



and 



\FiP)\\H^H= max |^^(a)| 

aGspcc(P|//) 



Remaric A.3. Mostly in the literature the case where H \s a complex Hilbert space is 
considered. In [KG82] they handle both, real and complex Hilbert spaces. Note that the 
integral in i|A.H and l|A.2t is defined with respect to a signed measure. 

A.2. Interpolation Theorem 

We state a version of the Theorem of Riesz-Thorin. For a proof and further details let us 
refer to IB L76I IBS881 . Let Lp = Lp{D, vr) for a probability measure tt on a measurable 
space (I?,2)). 

Proposition A.4 (Theorem of Riesz-Thorin). Let 1 < p,qi,q2 < oo. We assume that 

9 e (0, 1) and 

1 1-61 9 
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Further let T be a linear operator from Lq^ to Lq^ and at the same time from Lq^ to Lq 
with 



Then 



\Th^^^^^^<M^ and \\T\\,^^^,^^<Ah 



Remarl< A.5. We can substitute the function spaces Lp, Lq^, Lq.^ in the last proposition 
by the sequence spaces lp, lq-,, Iq^ and the result remains the same. 

Remarl< A.6. Note that we consider real-valued functions. If we would study functions 
which map into the complex numbers, then the same result holds true. In particular, the 
additional factor of two in the assertion is not needed. 



A.3. Conductance and the Cheeger inequality 

Let (£>,£>) be a measurable space. Assume K is a transition kernel defined on 
which is reversible with respect to a probability measure tt. The conductance of the 
transition kernel K is defined by 



^p{K, tt) = inf 



j^K{x,A^)T:{dx) 



0<7r(A)<i 7r(A) 
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Let {Xn)neN be a Markov chain with transition kernel K and initial distribution tt. Then 
the enumerator of the ratio within the definition of the conductance is the probability of 
Xi e A and X2 e A''. Hence one has 

f, K(x,A'')TT(dx) 

Pr X2 eA- X,eA) = ^ \' ■ 

7t{A) 

The conductance of K is the infimum over sets A e T) with < n{A) < 1/2 of the 
probability that X2 e A" under the condition that Xi e A. 

The Markov operator P, given by Pfix) = f{y) K{x,dy), is self-adjoint on L2 = 
L2{D, tt). For f eL2 let S{f) = f{x) 7r(dx) and let 

= {/ e L2 I S{f) = 0}. 

Furthermore we define 

A = sup {a \ a e spec(P|i§)} . 

The Cheeger inequality provides a relation between A and the conductance (p{K,tt). 

Proposition A.7 (Cheeger inequality). Let the transition kernel K be reversible with re- 
spect to a probability measure n. Then 

1-A>^i^. (A.3) 

For a proof of the inequality on finite state spaces we refer to [BehOO, Theorem 1 1 .3, 
p. 93]. The Cheeger inequality for general state spaces is proven by Lawler and Sokal 
in ILS88I Theorem 3.5, p. 570] and by Lovasz and Simonovits in ILS931 Lemma 1 .7, 
p. 374]. Lawler and Sokal provide different types of inequalities for Markov chains and 
Markov processes. 
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